From maj at fortinbras.us Wed Apr 1 01:28:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 01:28:24 -0400 Subject: [Bioperl-l] #bioperl bot talk Message-ID: <2589D1BF1EA24C119C06982EB70F490C@NewLife> Hi All, Some cool stuff going on on the IRC node (freenode.net/#bioperl). Andrew Stewart has been prototyping an irc bot with Bioperl functionality built-in. The possibilities for improving support and logging our increasing irc traffic are terrifying. I've set up a wiki page (http://www.bioperl.org/wiki/Bots) under the new IRC category for discussions. Please feel free to contribute use cases, ideas, praise and blame. cheers, Mark From johann.pellet at inserm.fr Wed Apr 1 06:14:25 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 1 Apr 2009 12:14:25 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: Hi all, With the latest version of BioPerl and BioSQL, I have tried to insert entry from a GenBank file, which I have downloaded from the NCBI website (648 937 records) After successfully loading ncbi_taxonomy i am getting following error message while loading sequences into database. perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg -dbname biosql --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Human papillomavirus type 2c' (I was supplied 'Human papillomavirus - 2 | Alphapapillomavirus | Pa pillomaviridae') the script is not stopped until this entry: S67864 --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, values were ("1","19)","1","3") FKs (41914,) ERROR: invalid input syntax for integer: "19)" --------------------------------------------------- Could not store S67864: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::LocationAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ig nored until end of transaction block STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /Library/ Perl/5.8.8/Bio/DB/BioSQL/SeqFeatureAdaptor.pm:291 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:257 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 Any Idea? Thanks in advance Johann From florent.angly at gmail.com Wed Apr 1 13:03:28 2009 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 01 Apr 2009 10:03:28 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D39E60.1020103@gmail.com> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you won't be able to put its information in a hash (unless you have a lot of memory). Florent Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. > The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. > > It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the simplest way. > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> Sent: Wednesday, 1 April 2009 7:43 a.m. >> To: bioperl-l >> Subject: [Bioperl-l] taxonomy ID >> >> Hi All, >> I am writing a script, for one of its part i have to parse a blast >> report (refseq blast) and check how may organisms are eukaryotes and how >> namy of them are prokaryotes. >> I am using BIO::DB::taxinomy module: >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >> >> But for this i need a taxonomyid (like '33090') given in the example. >> So is it possible to get a taxonomyid from refseq balst report? >> If not then how i can deal with this problem? >> >> i would really appreciate if anyone can help me out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Wed Apr 1 13:15:48 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 19:15:48 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> Message-ID: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Hi all, I have a list of PUBMED IDs and I am trying to retrieve automatically the *full article* in any format (not just the abstract). Is there any method in bioperl that allows this? any other solution? Currently I am trying to solve this using WWW::Mechanize, but do you know of any other method to do this? Any help would be appreciated, Thanks in advance, M; From kanzure at gmail.com Wed Apr 1 14:18:22 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Wed, 1 Apr 2009 13:18:22 -0500 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <55ad6af70904011118q7cbdb05u9c89958de3ccc87e@mail.gmail.com> On Wed, Apr 1, 2009 at 12:15 PM, Miguel Pignatelli wrote: > I have a list of PUBMED IDs and I am trying to retrieve automatically the > *full article* in any format (not just the abstract). Is there any method in > bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you know of > any other method to do this? You can try pubget.com- it's a web gateway to download pubmedcentral articles. Unfortunately this means it does not have pubmed articles. What I have found with pubmed is that it's mainly a listing of abstracts, and then the various papers may or may not be online in their respective journals on the web somewhere else, and rarely are there any links to the publisher website. So how are you using WWW::Mechanize in this context? Is there some secret to attaining papers that are listed via pubmed? There's no magical links to the publisher websites .. so what's going on? - Bryan http://heybryan.org/ 1 512 203 0507 From Russell.Smithies at agresearch.co.nz Wed Apr 1 15:33:35 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:33:35 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5615@exchsth.agresearch.co.nz> There's always more than one way to do it. I have no trouble loading it into a hash but you could just grep the file: my(undef,$tax_id) = split("\s", `grep -w -P "^$accession" gi_taxid_prot.dmp`); --Russell > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Thursday, 2 April 2009 6:03 a.m. > To: Smithies, Russell > Cc: 'shalabh sharma'; 'bioperl-l' > Subject: Re: [Bioperl-l] taxonomy ID > > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: > > The taxonomy information isn't in the blast output unless you created custom > fasta headers for your blast database. > > The easiest way to get the tax_id for your accessions would be to download > the gi->tax_id list from > ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > > If you load that file into a hash, parse the accessions out of the blast > hits then lookup the tax_id from that hash, I think it should be fairly fast. > > > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a > separate problem :-) > > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file > contained within lists what division each tax_id belongs to (Bacteria, > Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out > from that. > > > > It's not a very BioPerly solution but sometimes just looking up the answer > from a file/table/hash is the simplest way. > > > > Hope this helps, > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >> Sent: Wednesday, 1 April 2009 7:43 a.m. > >> To: bioperl-l > >> Subject: [Bioperl-l] taxonomy ID > >> > >> Hi All, > >> I am writing a script, for one of its part i have to parse a > blast > >> report (refseq blast) and check how may organisms are eukaryotes and how > >> namy of them are prokaryotes. > >> I am using BIO::DB::taxinomy module: > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> But for this i need a taxonomyid (like '33090') given in the example. > >> So is it possible to get a taxonomyid from refseq balst report? > >> If not then how i can deal with this problem? > >> > >> i would really appreciate if anyone can help me out. > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From Russell.Smithies at agresearch.co.nz Wed Apr 1 15:48:02 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:48:02 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Not all articles have full-text at Pubmed but if you know the article ID, you can usually get the whole article (if available) like this: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez or as pdf http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf I'd just build a URL and use wget. If you're searching Pubmed directly, use a query like this to ensure you only get articles with links to full text: cancer AND (free full text[sb]) eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text[sb]) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > Sent: Thursday, 2 April 2009 6:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Hi all, > > I have a list of PUBMED IDs and I am trying to retrieve automatically > the *full article* in any format (not just the abstract). Is there any > method in bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you > know of any other method to do this? > > Any help would be appreciated, > > Thanks in advance, > > M; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From miguel.pignatelli at uv.es Wed Apr 1 18:14:13 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 2 Apr 2009 00:14:13 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: Thanks for the response, I have PMIDs extracted from Genbank flat files, is there a way to convert PMIDs to PMCIDs? I found this page: http://www.ncbi.nlm.nih.gov/sites/pmctopmid Is it possible to download the underlying conversion table for local use? Thank you very much in advance, M; El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > Not all articles have full-text at Pubmed but if you know the > article ID, you can usually get the whole article (if available) > like this: > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez > > or as pdf > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > I'd just build a URL and use wget. > > If you're searching Pubmed directly, use a query like this to ensure > you only get articles with links to full text: > > cancer AND (free full text[sb]) > eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text > [sb]) > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli >> Sent: Thursday, 2 April 2009 6:16 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles >> >> Hi all, >> >> I have a list of PUBMED IDs and I am trying to retrieve automatically >> the *full article* in any format (not just the abstract). Is there >> any >> method in bioperl that allows this? any other solution? >> Currently I am trying to solve this using WWW::Mechanize, but do you >> know of any other method to do this? >> >> Any help would be appreciated, >> >> Thanks in advance, >> >> M; >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > From Russell.Smithies at agresearch.co.nz Wed Apr 1 18:47:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 11:47:30 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5761@exchsth.agresearch.co.nz> Try this: http://www.pubmedcentral.nih.gov/about/ftp.html#Obtaining_DOIs Use ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz to associate PMC articles with a PMC ID, a PubMed ID, and the corresponding DOI. PMC-ids.csv.gz is a comma separated file with the following fields: * Journal Title * ISSN * Electronic ISSN * Publication Year * Volume * Issue * Page * DOI (if available) * PMC ID * PubMed ID (if available) * Manuscript ID (if available) * Release Date (Mmm DD YYYY or live) --Russell > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Thursday, 2 April 2009 11:14 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Thanks for the response, > > I have PMIDs extracted from Genbank flat files, is there a way to > convert PMIDs to PMCIDs? > I found this page: > > http://www.ncbi.nlm.nih.gov/sites/pmctopmid > > Is it possible to download the underlying conversion table for local > use? > > Thank you very much in advance, > > M; > > > El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > > > Not all articles have full-text at Pubmed but if you know the > > article ID, you can usually get the whole article (if available) > > like this: > > > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentr > ez > > > > or as pdf > > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > > > I'd just build a URL and use wget. > > > > If you're searching Pubmed directly, use a query like this to ensure > > you only get articles with links to full text: > > > > cancer AND (free full text[sb]) > > eg > http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free > +full+text > > [sb]) > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > >> Sent: Thursday, 2 April 2009 6:16 a.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > >> > >> Hi all, > >> > >> I have a list of PUBMED IDs and I am trying to retrieve automatically > >> the *full article* in any format (not just the abstract). Is there > >> any > >> method in bioperl that allows this? any other solution? > >> Currently I am trying to solve this using WWW::Mechanize, but do you > >> know of any other method to do this? > >> > >> Any help would be appreciated, > >> > >> Thanks in advance, > >> > >> M; > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > From tristan.lefebure at gmail.com Wed Apr 1 23:11:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 1 Apr 2009 23:11:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq Message-ID: <200904012311.51764.tristan.lefebure@gmail.com> Hi there, I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. Here is the description: Title : uniq_seq Usage : $aln->uniq_seq(): Remove identical sequences in in the alignment. Ambiguous base ("N", "n") and leading and ending gaps ("-") are NOT counted as differences. Function : Make a new alignment of unique sequence types (STs) Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") 2. ST of each sequence in STDERR Argument : None What I'm trying to obtain is the ST composition (i.e. what is supposed to go to STDERR), but I see nothing... An example: --------test.fasta: >seq1 AAATTTC >seq2 CAATTTC >seq3 AAATTTC ------- ----------test.pl: #! /usr/bin/perl use strict; use warnings; use Bio::AlignIO; use Bio::SimpleAlign; use Getopt::Long; my $in = Bio::AlignIO->new(-file => 'test.fasta' , -format => 'fasta'); my $out = Bio::AlignIO->new(-file => ">test.out" , -format => 'fasta'); while ( my $aln = $in->next_aln() ) { my $red_aln = $aln->uniq_seq; $out->write_aln($red_aln); } ------------- If you run: ./test.pl &> log you will get nothing written into the log file... (but the test.out is OK) Am I missing something? By the way, wouldn't it be more convenient to have the ST composition returned in an array? Thanks, --Tristan (BioPerl 1.6) From maj at fortinbras.us Wed Apr 1 23:28:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 23:28:23 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <200904012311.51764.tristan.lefebure@gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com> Message-ID: <29E09DCE622643848EAFA8F1C6210711@NewLife> Tristan-- Strange: it looks like the prints to stderr have been commented out in the source (back in revision 10242; 1.6 is rev 15582). The two statements are easy to find in the SimpleAlign.pm uniq_seq() source; you can uncomment them to work around this. You are right, this is rather an unconventional way to specify an output option-- can Chris comment? Mark ----- Original Message ----- From: "Tristan Lefebure" To: "BioPerl List" Sent: Wednesday, April 01, 2009 11:11 PM Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Hi there, > > I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. > Here is the description: > > Title : uniq_seq > Usage : $aln->uniq_seq(): Remove identical sequences in > in the alignment. Ambiguous base ("N", "n") and > leading and ending gaps ("-") are NOT counted as > differences. > Function : Make a new alignment of unique sequence types (STs) > Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") > 2. ST of each sequence in STDERR > Argument : None > > What I'm trying to obtain is the ST composition (i.e. what is supposed to go > to STDERR), but I see nothing... > > An example: > > --------test.fasta: >>seq1 > AAATTTC >>seq2 > CAATTTC >>seq3 > AAATTTC > ------- > > > ----------test.pl: > #! /usr/bin/perl > > use strict; > use warnings; > use Bio::AlignIO; > use Bio::SimpleAlign; > use Getopt::Long; > > my $in = Bio::AlignIO->new(-file => 'test.fasta' , > -format => 'fasta'); > > my $out = Bio::AlignIO->new(-file => ">test.out" , > -format => 'fasta'); > > while ( my $aln = $in->next_aln() ) { > my $red_aln = $aln->uniq_seq; > $out->write_aln($red_aln); > } > ------------- > > If you run: > > ./test.pl &> log > > you will get nothing written into the log file... (but the test.out is OK) > > Am I missing something? > By the way, wouldn't it be more convenient to have the ST composition returned > in an array? > > Thanks, > > --Tristan > (BioPerl 1.6) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From weigangq at gmail.com Wed Apr 1 23:57:16 2009 From: weigangq at gmail.com (Weigang Qiu) Date: Wed, 1 Apr 2009 22:57:16 -0500 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <29E09DCE622643848EAFA8F1C6210711@NewLife> References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> Message-ID: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Mark and Tristan, I am the original instigator of the uniq_seq method. The STDERR implementation was used so that STDOUT could be piped. But it did not conform to bioperl convention of using the $self->debug() method. I think that's why these lines were commented out and re-implemented using the $self->debug method. So, turning on the debug option should give the intended ST mapping for each sequence in stderr. weigang On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > Tristan-- > Strange: it looks like the prints to stderr have been commented out in the > source (back in revision 10242; 1.6 is rev 15582). The > two statements are easy to find in the SimpleAlign.pm uniq_seq() source; > you can > uncomment them to work around this. > You are right, this is rather an unconventional way to specify an output > option-- can Chris comment? > Mark > ----- Original Message ----- From: "Tristan Lefebure" < > tristan.lefebure at gmail.com> > To: "BioPerl List" > Sent: Wednesday, April 01, 2009 11:11 PM > Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Hi there, >> >> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >> Here is the description: >> >> Title : uniq_seq >> Usage : $aln->uniq_seq(): Remove identical sequences in >> in the alignment. Ambiguous base ("N", "n") and >> leading and ending gaps ("-") are NOT counted as >> differences. >> Function : Make a new alignment of unique sequence types (STs) >> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >> "ST") >> 2. ST of each sequence in STDERR >> Argument : None >> >> What I'm trying to obtain is the ST composition (i.e. what is supposed to >> go >> to STDERR), but I see nothing... >> >> An example: >> >> --------test.fasta: >> >>> seq1 >>> >> AAATTTC >> >>> seq2 >>> >> CAATTTC >> >>> seq3 >>> >> AAATTTC >> ------- >> >> >> ----------test.pl: >> #! /usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::AlignIO; >> use Bio::SimpleAlign; >> use Getopt::Long; >> >> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >> -format => 'fasta'); >> >> my $out = Bio::AlignIO->new(-file => ">test.out" , >> -format => 'fasta'); >> >> while ( my $aln = $in->next_aln() ) { >> my $red_aln = $aln->uniq_seq; >> $out->write_aln($red_aln); >> } >> ------------- >> >> If you run: >> >> ./test.pl &> log >> >> you will get nothing written into the log file... (but the test.out is OK) >> >> Am I missing something? >> By the way, wouldn't it be more convenient to have the ST composition >> returned >> in an array? >> >> Thanks, >> >> --Tristan >> (BioPerl 1.6) >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Weigang Qiu Department of Biological Sciences Hunter College, City University of New York 695 Park Avenue New York, NY 10065 From maj at fortinbras.us Thu Apr 2 00:15:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 00:15:06 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks Weigang-- I didn't look carefully enough-- I'll make a change to the POD. so Tristan, in your code below, add $aln->verbose(1); before you invoke uniq_seq(). The ST's should then be sent to stderr (as "warns"). MAJ ----- Original Message ----- From: "Weigang Qiu" To: "Mark A. Jensen" Cc: "BioPerl List" ; Sent: Wednesday, April 01, 2009 11:57 PM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Mark and Tristan, > > I am the original instigator of the uniq_seq method. The STDERR > implementation was used so that STDOUT could be piped. But it did not > conform to bioperl convention of using the $self->debug() method. I think > that's why these lines were commented out and re-implemented using the > $self->debug method. So, turning on the debug option should give the > intended ST mapping for each sequence in stderr. > > weigang > > On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > >> Tristan-- >> Strange: it looks like the prints to stderr have been commented out in the >> source (back in revision 10242; 1.6 is rev 15582). The >> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >> you can >> uncomment them to work around this. >> You are right, this is rather an unconventional way to specify an output >> option-- can Chris comment? >> Mark >> ----- Original Message ----- From: "Tristan Lefebure" < >> tristan.lefebure at gmail.com> >> To: "BioPerl List" >> Sent: Wednesday, April 01, 2009 11:11 PM >> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Hi there, >>> >>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >>> Here is the description: >>> >>> Title : uniq_seq >>> Usage : $aln->uniq_seq(): Remove identical sequences in >>> in the alignment. Ambiguous base ("N", "n") and >>> leading and ending gaps ("-") are NOT counted as >>> differences. >>> Function : Make a new alignment of unique sequence types (STs) >>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>> "ST") >>> 2. ST of each sequence in STDERR >>> Argument : None >>> >>> What I'm trying to obtain is the ST composition (i.e. what is supposed to >>> go >>> to STDERR), but I see nothing... >>> >>> An example: >>> >>> --------test.fasta: >>> >>>> seq1 >>>> >>> AAATTTC >>> >>>> seq2 >>>> >>> CAATTTC >>> >>>> seq3 >>>> >>> AAATTTC >>> ------- >>> >>> >>> ----------test.pl: >>> #! /usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use Bio::AlignIO; >>> use Bio::SimpleAlign; >>> use Getopt::Long; >>> >>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>> -format => 'fasta'); >>> >>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>> -format => 'fasta'); >>> >>> while ( my $aln = $in->next_aln() ) { >>> my $red_aln = $aln->uniq_seq; >>> $out->write_aln($red_aln); >>> } >>> ------------- >>> >>> If you run: >>> >>> ./test.pl &> log >>> >>> you will get nothing written into the log file... (but the test.out is OK) >>> >>> Am I missing something? >>> By the way, wouldn't it be more convenient to have the ST composition >>> returned >>> in an array? >>> >>> Thanks, >>> >>> --Tristan >>> (BioPerl 1.6) >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Weigang Qiu > Department of Biological Sciences > Hunter College, City University of New York > 695 Park Avenue > New York, NY 10065 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Thu Apr 2 04:17:02 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 02 Apr 2009 10:17:02 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <49D4747E.4060001@uv.es> You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB->get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = taxDB->get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB->get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB->get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB->get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB->get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin You may consult the documentation of the module for a full description. A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each GI in your blast result: my $superkingdom = $taxDB->get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem while using it don't hesitate to contact me. Hope this helps, M; Florent Angly wrote: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from >> ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should be >> fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the nodes.dmp >> file contained within lists what division each tax_id belongs to >> (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can >> probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E >> russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 >> 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to parse >>> a blast >>> report (refseq blast) and check how may organisms are eukaryotes and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Apr 2 08:29:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 08:29:47 -0400 Subject: [Bioperl-l] FYI: note on wiki template behavior Message-ID: <62B28D02BEA44E13BBDB5531FF6D67CF@NewLife> Wiki-interested folks- I fixed a "feature" in the HOWTO template-- When the template was used twice in the same line of text, the text following the first instance was rendered as a "code box". This had to do with how the template itself was formatted. If you're interested, please have a look at http://www.bioperl.org/wiki/Template_talk:HOWTO cheers, Mark From tristan.lefebure at gmail.com Thu Apr 2 09:30:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 2 Apr 2009 09:30:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks you both, To internally store the ST composition, so that I can reuse it in the same script, I made the following modifications to SimpleAlign.pm: diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm 590a591,592 > #modified to also returned an array of the ST composition > my %st; 651a654 > push @{$st{$order{$str}}}, $_->id(); 655c658 < return $aln; --- > return ($aln, %st); This is probably not really BioPerl compliant. Being an OBO ignorant, I wonder if we could add this information somewhere either once in the $aln object, or by little pieces in each Bio::LocatableSeq objects? Thks, --Tristan On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > Thanks Weigang-- I didn't look carefully enough-- > I'll make a change to the POD. > so Tristan, in your code below, add > > $aln->verbose(1); > > before you invoke uniq_seq(). The ST's should > then be sent to stderr (as "warns"). > > MAJ > ----- Original Message ----- From: "Weigang Qiu" > To: "Mark A. Jensen" > Cc: "BioPerl List" ; < > tristan.lefebure at gmail.com> > Sent: Wednesday, April 01, 2009 11:57 PM > Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Mark and Tristan, >> >> I am the original instigator of the uniq_seq method. The STDERR >> implementation was used so that STDOUT could be piped. But it did not >> conform to bioperl convention of using the $self->debug() method. I think >> that's why these lines were commented out and re-implemented using the >> $self->debug method. So, turning on the debug option should give the >> intended ST mapping for each sequence in stderr. >> >> weigang >> >> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >> wrote: >> >> Tristan-- >>> Strange: it looks like the prints to stderr have been commented out in >>> the >>> source (back in revision 10242; 1.6 is rev 15582). The >>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>> you can >>> uncomment them to work around this. >>> You are right, this is rather an unconventional way to specify an output >>> option-- can Chris comment? >>> Mark >>> ----- Original Message ----- From: "Tristan Lefebure" < >>> tristan.lefebure at gmail.com> >>> To: "BioPerl List" >>> Sent: Wednesday, April 01, 2009 11:11 PM >>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>> >>> >>> >>> Hi there, >>> >>>> >>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>> module. >>>> Here is the description: >>>> >>>> Title : uniq_seq >>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>> in the alignment. Ambiguous base ("N", "n") and >>>> leading and ending gaps ("-") are NOT counted as >>>> differences. >>>> Function : Make a new alignment of unique sequence types (STs) >>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>> "ST") >>>> 2. ST of each sequence in STDERR >>>> Argument : None >>>> >>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>> to >>>> go >>>> to STDERR), but I see nothing... >>>> >>>> An example: >>>> >>>> --------test.fasta: >>>> >>>> seq1 >>>>> >>>>> AAATTTC >>>> >>>> seq2 >>>>> >>>>> CAATTTC >>>> >>>> seq3 >>>>> >>>>> AAATTTC >>>> ------- >>>> >>>> >>>> ----------test.pl: >>>> #! /usr/bin/perl >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::AlignIO; >>>> use Bio::SimpleAlign; >>>> use Getopt::Long; >>>> >>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>> -format => 'fasta'); >>>> >>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>> -format => 'fasta'); >>>> >>>> while ( my $aln = $in->next_aln() ) { >>>> my $red_aln = $aln->uniq_seq; >>>> $out->write_aln($red_aln); >>>> } >>>> ------------- >>>> >>>> If you run: >>>> >>>> ./test.pl &> log >>>> >>>> you will get nothing written into the log file... (but the test.out is >>>> OK) >>>> >>>> Am I missing something? >>>> By the way, wouldn't it be more convenient to have the ST composition >>>> returned >>>> in an array? >>>> >>>> Thanks, >>>> >>>> --Tristan >>>> (BioPerl 1.6) >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Weigang Qiu >> Department of Biological Sciences >> Hunter College, City University of New York >> 695 Park Avenue >> New York, NY 10065 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> From dereje1227 at yahoo.com Thu Apr 2 09:45:08 2009 From: dereje1227 at yahoo.com (demis001) Date: Thu, 2 Apr 2009 06:45:08 -0700 (PDT) Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 Message-ID: <22816585.post@talk.nabble.com> Hi , I am new to BioPerl and this forum and even do not know how to post the new post. I have one question for you guys. Is there any BioPerl module that allows me to download sequence based on chromosome name, seqStart and SeqEnd given the formatted human genome database downloaded on my Linux desktop? I used to do this using Perl $URI object and it is really slow as the process depend on the network. To be more specific, I took chrName, seqStart and seqEnd and go to Ensembl database to get the sequence one by one using Perl $URI object. I thought it might be easier if I process locally using indexed database using BioPerl module if there is any designed for this purpose. Input, millions rows of tab delimited (CSV) file contain information about chrName, seqStart, seqEnd. Locally formatted/indexed human genome. Output should be the fasta sequence contain the sequence and with the header contain chr name and location persed Sorry if I posted in the wrong section of the forum and happy to get any recommendation. Thanks Govind Chandra wrote: > > Hi, > > The code below > > > ====== code begins ======= > #use strict; > use Bio::SeqIO; > > $infile='NC_000913.gbk'; > my $seqio=Bio::SeqIO->new(-file => $infile); > my $seqobj=$seqio->next_seq(); > my @features=$seqobj->all_SeqFeatures(); > my $count=0; > foreach my $feature (@features) { > unless($feature->primary_tag() eq 'CDS') {next;} > print($feature->start()," ", $feature->end(), " > ",$feature->strand(),"\n"); > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); > @temp2=$ac->get_Annotations(); > print("$temp1 $temp2[0] @temp2\n"); > if($count++ > 5) {last;} > } > > print(ref($ac),"\n"); > exit; > > ======= code ends ======== > > produces the output > > ========== output begins ======== > > 190 255 1 > 0 > 337 2799 1 > 0 > 2801 3733 1 > 0 > 3734 5020 1 > 0 > 5234 5530 1 > 0 > 5683 6459 -1 > 0 > 6529 7959 -1 > 0 > Bio::Annotation::Collection > > =========== output ends ========== > > $ac is-a Bio::Annotation::Collection but does not actually contain any > annotation from the feature. Is this how it should be? I cannot figure > out what is wrong with the script. Earlier I used to use has_tag(), > get_tag_values() etc. but the documentation says these are deprecated. > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > -a is > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Thanks in advance for any help. > > Govind > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Thu Apr 2 09:46:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 09:46:36 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife><7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Hi Tristan-- I think this is a good thought, Can you register this as an enhancement at http://bugzilla.bioperl.org ? Please go ahead and attach the diff as a patch to the 'bug' report-- thanks for *your* input- cheers, Mark ----- Original Message ----- From: "Tristan Lefebure" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Weigang Qiu" Sent: Thursday, April 02, 2009 9:30 AM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Thanks you both, > > To internally store the ST composition, so that I can reuse it in the same > script, I made the following modifications to SimpleAlign.pm: > > diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm > /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm > 590a591,592 >> #modified to also returned an array of the ST composition >> my %st; > 651a654 >> push @{$st{$order{$str}}}, $_->id(); > 655c658 > < return $aln; > --- >> return ($aln, %st); > > This is probably not really BioPerl compliant. Being an OBO ignorant, I > wonder if we could add this information somewhere either once in the $aln > object, or by little pieces in each Bio::LocatableSeq objects? > > Thks, > > --Tristan > > On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > >> Thanks Weigang-- I didn't look carefully enough-- >> I'll make a change to the POD. >> so Tristan, in your code below, add >> >> $aln->verbose(1); >> >> before you invoke uniq_seq(). The ST's should >> then be sent to stderr (as "warns"). >> >> MAJ >> ----- Original Message ----- From: "Weigang Qiu" >> To: "Mark A. Jensen" >> Cc: "BioPerl List" ; < >> tristan.lefebure at gmail.com> >> Sent: Wednesday, April 01, 2009 11:57 PM >> Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Mark and Tristan, >>> >>> I am the original instigator of the uniq_seq method. The STDERR >>> implementation was used so that STDOUT could be piped. But it did not >>> conform to bioperl convention of using the $self->debug() method. I think >>> that's why these lines were commented out and re-implemented using the >>> $self->debug method. So, turning on the debug option should give the >>> intended ST mapping for each sequence in stderr. >>> >>> weigang >>> >>> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >>> wrote: >>> >>> Tristan-- >>>> Strange: it looks like the prints to stderr have been commented out in >>>> the >>>> source (back in revision 10242; 1.6 is rev 15582). The >>>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>>> you can >>>> uncomment them to work around this. >>>> You are right, this is rather an unconventional way to specify an output >>>> option-- can Chris comment? >>>> Mark >>>> ----- Original Message ----- From: "Tristan Lefebure" < >>>> tristan.lefebure at gmail.com> >>>> To: "BioPerl List" >>>> Sent: Wednesday, April 01, 2009 11:11 PM >>>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>>> >>>> >>>> >>>> Hi there, >>>> >>>>> >>>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>>> module. >>>>> Here is the description: >>>>> >>>>> Title : uniq_seq >>>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>>> in the alignment. Ambiguous base ("N", "n") and >>>>> leading and ending gaps ("-") are NOT counted as >>>>> differences. >>>>> Function : Make a new alignment of unique sequence types (STs) >>>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>>> "ST") >>>>> 2. ST of each sequence in STDERR >>>>> Argument : None >>>>> >>>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>>> to >>>>> go >>>>> to STDERR), but I see nothing... >>>>> >>>>> An example: >>>>> >>>>> --------test.fasta: >>>>> >>>>> seq1 >>>>>> >>>>>> AAATTTC >>>>> >>>>> seq2 >>>>>> >>>>>> CAATTTC >>>>> >>>>> seq3 >>>>>> >>>>>> AAATTTC >>>>> ------- >>>>> >>>>> >>>>> ----------test.pl: >>>>> #! /usr/bin/perl >>>>> >>>>> use strict; >>>>> use warnings; >>>>> use Bio::AlignIO; >>>>> use Bio::SimpleAlign; >>>>> use Getopt::Long; >>>>> >>>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>>> -format => 'fasta'); >>>>> >>>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>>> -format => 'fasta'); >>>>> >>>>> while ( my $aln = $in->next_aln() ) { >>>>> my $red_aln = $aln->uniq_seq; >>>>> $out->write_aln($red_aln); >>>>> } >>>>> ------------- >>>>> >>>>> If you run: >>>>> >>>>> ./test.pl &> log >>>>> >>>>> you will get nothing written into the log file... (but the test.out is >>>>> OK) >>>>> >>>>> Am I missing something? >>>>> By the way, wouldn't it be more convenient to have the ST composition >>>>> returned >>>>> in an array? >>>>> >>>>> Thanks, >>>>> >>>>> --Tristan >>>>> (BioPerl 1.6) >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> -- >>> Weigang Qiu >>> Department of Biological Sciences >>> Hunter College, City University of New York >>> 695 Park Avenue >>> New York, NY 10065 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Wed Apr 1 08:00:59 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 01 Apr 2009 13:00:59 +0100 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D3577B.1090409@sendu.me.uk> Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created > custom fasta headers for your blast database. The easiest way to get > the tax_id for your accessions would be to download the gi->tax_id > list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the > blast hits then lookup the tax_id from that hash, I think it should > be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on > tax_id is a separate problem :-) If you grab the taxdump.tar.gz file > from the same site, the nodes.dmp file contained within lists what > division each tax_id belongs to (Bacteria, Invertebrates, Mammals, > Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. From shalabh.sharma7 at gmail.com Thu Apr 2 15:50:58 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 2 Apr 2009 15:50:58 -0400 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D3577B.1090409@sendu.me.uk> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> Message-ID: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala wrote: > Smithies, Russell wrote: > >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. The easiest way to get >> the tax_id for your accessions would be to download the gi->tax_id >> list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If >> you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) If you grab the taxdump.tar.gz file >> from the same site, the nodes.dmp file contained within lists what >> division each tax_id belongs to (Bacteria, Invertebrates, Mammals, >> Phages, Plants, etc) so you can probably work it out from that. >> > > Check out the synopsis for Bio::Taxon > http://doc.bioperl.org/bioperl-live/Bio/Taxon.html > > If the division() function doesn't tell you what you need, you could use > get_lineage_nodes() and check the oldest ancestors to see if its a pro > or euk. > From Russell.Smithies at agresearch.co.nz Thu Apr 2 15:55:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 08:55:06 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> We're here to help - unless it's to do your homework ;-) --Russell From: shalabh sharma [mailto:shalabh.sharma7 at gmail.com] Sent: Friday, 3 April 2009 8:51 a.m. To: Sendu Bala Cc: Smithies, Russell; bioperl-l Subject: Re: [Bioperl-l] taxonomy ID thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala > wrote: Smithies, Russell wrote: The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Thu Apr 2 20:46:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 13:46:39 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> I'm re-formatting some blast output into nice html webpages but am finding $self->end_report() and $self->footer() don't seem to be working. The other methods ($self->start_report, $self->introduction, $self->title) all work fine. Am I doing something wrong or is there a trick to it? Here's some test code: ================================== #!perl -w use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use CGI qw(:standard); my $in = Bio::SearchIO->new(-format => "blast",-file => shift @ARGV, ); my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); $index->start_report( \&my_start_report ); $index->title( \&my_title ); $index->footer(\&my_footer); $index->end_report(\&my_end_report); my $out = Bio::SearchIO->new(-writer => $index, -file => ">blast.htm"); $out->write_result($in->next_result); sub my_start_report{ return h1('this is my header'); } sub my_title{ return h1('this is my title'); } sub my_footer{ my ($self) = @_; return h2('this is a footer'); } sub my_end_report { return h2('this is the end'); } ================================= Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Thu Apr 2 21:09:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 2 Apr 2009 18:09:20 -0700 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> Message-ID: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> looking at the code - it doesn't seem to accept resetting the default value. sub end_report { return "\n\n"; } sub footer { my ($self) = @_; return "
Produced by Bioperl module ".ref($self)." on $DATE
Revision: $Revision
\n" } So just adjusting it to mirror what is happening for title and the rest would be necessary. -jason On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > I'm re-formatting some blast output into nice html webpages but am > finding $self->end_report() and $self->footer() don't seem to be > working. > The other methods ($self->start_report, $self->introduction, $self- > >title) all work fine. > Am I doing something wrong or is there a trick to it? > > Here's some test code: > ================================== > > #!perl -w > > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > use CGI qw(:standard); > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > @ARGV, ); > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > $index->start_report( \&my_start_report ); > $index->title( \&my_title ); > $index->footer(\&my_footer); > $index->end_report(\&my_end_report); > > my $out = Bio::SearchIO->new(-writer => $index, -file => > ">blast.htm"); > > $out->write_result($in->next_result); > > > sub my_start_report{ > return h1('this is my header'); > } > > sub my_title{ > return h1('this is my title'); > } > > sub my_footer{ > my ($self) = @_; > return h2('this is a footer'); > } > > sub my_end_report { > return h2('this is the end'); > } > > ================================= > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Thu Apr 2 22:16:34 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 15:16:34 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEE2E@exchsth.agresearch.co.nz> Not wanting to be picky... But $result_>database_name (for blast results) returns the description of the database rather than just the name. Eg. "hs.fna (Human mRNA Refseqs)" instead of "hs.fna" I've had a hunt but can't see where the code for getting the database_name is. Any ideas? Thanx, --Russell > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > Stajich > Sent: Friday, 3 April 2009 2:09 p.m. > To: Smithies, Russell > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? > > looking at the code - it doesn't seem to accept resetting the default > value. > sub end_report { > return "\n\n"; > } > > sub footer { > my ($self) = @_; > return "
Produced by Bioperl module ".ref($self)." on > $DATE
Revision: $Revision
\n" > > } > > So just adjusting it to mirror what is happening for title and the > rest would be necessary. > > -jason > On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > > > I'm re-formatting some blast output into nice html webpages but am > > finding $self->end_report() and $self->footer() don't seem to be > > working. > > The other methods ($self->start_report, $self->introduction, $self- > > >title) all work fine. > > Am I doing something wrong or is there a trick to it? > > > > Here's some test code: > > ================================== > > > > #!perl -w > > > > use Bio::SearchIO; > > use Bio::SearchIO::Writer::HTMLResultWriter; > > use CGI qw(:standard); > > > > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > > @ARGV, ); > > > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > > > $index->start_report( \&my_start_report ); > > $index->title( \&my_title ); > > $index->footer(\&my_footer); > > $index->end_report(\&my_end_report); > > > > my $out = Bio::SearchIO->new(-writer => $index, -file => > > ">blast.htm"); > > > > $out->write_result($in->next_result); > > > > > > sub my_start_report{ > > return h1('this is my header'); > > } > > > > sub my_title{ > > return h1('this is my title'); > > } > > > > sub my_footer{ > > my ($self) = @_; > > return h2('this is a footer'); > > } > > > > sub my_end_report { > > return h2('this is the end'); > > } > > > > ================================= > > > > Thanx, > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > From bernd.web at gmail.com Fri Apr 3 09:47:23 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 15:47:23 +0200 Subject: [Bioperl-l] AlignIO clustal Message-ID: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Hi, Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a clustalw alignment. At the moment, I cannot update to a newer version, so am not sure this problem still exists. The problem is that the $aln object does not exists when the last sequence in a block contains gaps only. Anybody has seen this or knows a fix? Code and example input follows below. Regards, Bernd use Bio::AlignIO; my $in = Bio::AlignIO->new(-file => 'test.aln', -format => 'clustalw'); my $out = Bio::AlignIO->new(-file => '>testerr.ALN', -format => 'clustalw'); my $aln = $in->next_aln(); print $aln->length, "\n"; test.aln contains: CLUSTAL W(1.81) multiple sequence alignment QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI 7082547/1-128 ---------ERATNDMLIGP--DWAVN 1_3265048/1-0 --------------------------- 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV 1_3265047/1-0 --------------------------- From bernd.web at gmail.com Fri Apr 3 10:11:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 16:11:44 +0200 Subject: [Bioperl-l] AlignIO clustal In-Reply-To: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> References: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Message-ID: <716af09c0904030711l8252943hff489ccb9f720920@mail.gmail.com> Hi, I noticed this issue is not specific to Clustal; it also occurs for Fasta. The "problem" arises in a last check, which is only done on the last sequence; it is still present in the current code (webcvs) in the next_aln code. In fasta.pm: # If $end <= 0, we have either reached the end of # file in <> or we have encountered some other error if ( $end <= 0 ) { undef $aln; return $aln; } In clustalw.pm # not sure if this should be a default option - or we can pass in # an option to do this in the future? --jason stajich # $aln->map_chars('\.','-'); undef $aln if ( !defined $end || $end <= 0 ); return $aln; And the last sequence actually got a zero end. This was given in an $aln->slice where gap only sequences are retained. It will also get a "0" in next_aln itself if no coordinates would be present. 1_3265047/1-0 --------------------------- For now, uncommenting "undef $aln if ( !defined $end || $end <= 0 );" works. Regards, Bernd On Fri, Apr 3, 2009 at 3:47 PM, Bernd Web wrote: > Hi, > > Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a > clustalw alignment. > At the moment, I cannot update to a newer version, so am not sure this > problem still exists. > > The problem is that the $aln object does not exists when the last > sequence in a block contains gaps only. > Anybody has seen this or knows a fix? Code and example input follows below. > > > Regards, > Bernd > > > use Bio::AlignIO; > my $in = Bio::AlignIO->new(-file => 'test.aln', > -format => 'clustalw'); > > my $out = Bio::AlignIO->new(-file => '>testerr.ALN', > -format => 'clustalw'); > > my $aln = $in->next_aln(); > print $aln->length, "\n"; > > test.aln contains: > > CLUSTAL W(1.81) multiple sequence alignment > > > QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI > 7082547/1-128 ---------ERATNDMLIGP--DWAVN > 1_3265048/1-0 --------------------------- > 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV > 1_3265047/1-0 --------------------------- > From hlapp at gmx.net Mon Apr 6 11:39:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 6 Apr 2009 11:39:50 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> (Removing biosql-l from the cc list as this seems to be a problem with BioPerl.) Hi Johann, I don't know whether anyone has responded to you yet - if not I'm sorry, I've been inundated for the past couple test. On Apr 1, 2009, at 6:14 AM, Johann PELLET wrote: > With the latest version of BioPerl and BioSQL, I have tried to > insert entry from a GenBank file, which I have downloaded from the > NCBI website (648 937 records) Could you be more specific? When you say the latest version of BioPerl, do you mean 1.6.1 or the current svn snapshot of the main trunk? And which Genbank file is it? Is it one with only viruses, i.e., are you specifically interested in the virus sequences that the parser is giving you trouble with? > After successfully loading ncbi_taxonomy i am getting following > error message while loading sequences into database. > > perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg - > dbname biosql > > > --------------------- WARNING --------------------- > MSG: The supplied lineage does not start near 'Human papillomavirus > type 2c' (I was supplied 'Human papillomavirus - 2 | > Alphapapillomavirus | Papillomaviridae') This is a problem in the BioPerl genbank parser, or more specifically, in the species parser. I thought though this was fixed in 1.6.1; are you sure you don't have an older version of BioPerl lying around that could accidentally have been used? That said, it only seems to be a warning; did you check how the record ended up in the database and found it to be incomplete or messed up? > the script is not stopped until this entry: S67864 This a later entry, not the same entry that causes the problem above, right? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, > values were ("1","19)","1","3") FKs (41914,) > ERROR: invalid input syntax for integer: "19)" Oops - that's a problem that must originate from the BioPerl feature location parser. The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 Does anyone see why the location parser should have a problem with the first gene feature? It's nested, and has remote location components, but at first sight nothing jumps out at me as extraordinary. Has someone recently changed the location parsing code? If no-one has an immediate idea what could be at work here, this needs investigating. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From torsten.seemann at infotech.monash.edu.au Mon Apr 6 21:05:25 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 7 Apr 2009 11:05:25 +1000 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: > The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > Does anyone see why the location parser should have a problem with the first > gene feature? It's nested, and has remote location components, but at first > sight nothing jumps out at me as extraordinary. Has someone recently changed > the location parsing code? If no-one has an immediate idea what could be at > work here, this needs investigating. I'm not sure if Bioperl handles the order() operator? For those unfamilair with the order() operator: http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 order(location,location, ... location) The elements can be found in the specified order (5' to 3' direction), but nothing is implied about the reasonableness about joining them. --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Mon Apr 6 23:59:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Apr 2009 22:59:14 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 > > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > >> Does anyone see why the location parser should have a problem with >> the first >> gene feature? It's nested, and has remote location components, but >> at first >> sight nothing jumps out at me as extraordinary. Has someone >> recently changed >> the location parsing code? If no-one has an immediate idea what >> could be at >> work here, this needs investigating. The location parsing code was refactored above 3-4 years ago w/o problems. This'll be the first one to crop up. I'll try taking a look at it. > I'm not sure if Bioperl handles the order() operator? > > For those unfamilair with the order() operator: > > http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 > > order(location,location, ... location) > The elements can be found in the specified order (5' to 3' direction), > but nothing is implied about the reasonableness about joining them. > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA It's interesting that the version from eutils differs significantly in the feature table when retrieving 'gb' or 'gbwithparts', the latter resolves the location (see below). Regardless we'll need to make sure this is parseable. .... FEATURES Location/Qualifiers source 1..77 /organism="Ovine respiratory syncytial virus" /mol_type="genomic RNA" /db_xref="taxon:28869" gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) /gene="G" gene 55..>77 /gene="fusion glycoprotein F" chris From cjfields at illinois.edu Tue Apr 7 01:32:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Apr 2009 00:32:52 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> Message-ID: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Fixed in svn now and have added this as a test case (passes all tests in bioperl-live). For some reason this wasn't catching some more complex combinations of operators, mainly those with mixes of order/ join. chris On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: > >>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >> >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> >>> Does anyone see why the location parser should have a problem with >>> the first >>> gene feature? It's nested, and has remote location components, but >>> at first >>> sight nothing jumps out at me as extraordinary. Has someone >>> recently changed >>> the location parsing code? If no-one has an immediate idea what >>> could be at >>> work here, this needs investigating. > > The location parsing code was refactored above 3-4 years ago w/o > problems. This'll be the first one to crop up. I'll try taking a > look at it. > >> I'm not sure if Bioperl handles the order() operator? >> >> For those unfamilair with the order() operator: >> >> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >> >> order(location,location, ... location) >> The elements can be found in the specified order (5' to 3' >> direction), >> but nothing is implied about the reasonableness about joining them. >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA > > It's interesting that the version from eutils differs significantly > in the feature table when retrieving 'gb' or 'gbwithparts', the > latter resolves the location (see below). Regardless we'll need to > make sure this is parseable. > > .... > > FEATURES Location/Qualifiers > source 1..77 > /organism="Ovine respiratory syncytial virus" > /mol_type="genomic RNA" > /db_xref="taxon:28869" > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > /gene="G" > gene 55..>77 > /gene="fusion glycoprotein F" > > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johann.pellet at inserm.fr Tue Apr 7 04:48:56 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Tue, 7 Apr 2009 10:48:56 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <73508372-0C43-4693-8135-45C128A25959@inserm.fr> Thanks all, I will update bioperl-live using svn right now, and I will restart to load sequences into my biosql database. Hilmar, My GenBank file contains only virus sequences. I downloaded it using eutils, (db=nuccore, tool=ebot, rettype=gb ...). Thank you again -- -- Johann Pellet Le 7 avr. 09 ? 07:32, Chris Fields a ?crit : > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Apr 7 13:56:27 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 7 Apr 2009 13:56:27 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Awesome, thanks Chris! $beer_owed++; -hilmar On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From juheymann at yahoo.com Tue Apr 7 14:20:04 2009 From: juheymann at yahoo.com (Jurgen Heymann) Date: Tue, 7 Apr 2009 11:20:04 -0700 (PDT) Subject: [Bioperl-l] restriction site map Message-ID: <237420.97841.qm@web54203.mail.re2.yahoo.com> Hi All: I would like to convert a table (restriction enzyme / position where it cuts in gene of interest) into a graphical representation. What avenues exists for that? Would appreciate your comments. Thank you, Jurgen From wenzhiwang1983 at yahoo.com.cn Tue Apr 7 21:39:59 2009 From: wenzhiwang1983 at yahoo.com.cn (Wen-Zhi WANG) Date: Wed, 8 Apr 2009 09:39:59 +0800 (CST) Subject: [Bioperl-l] Pasing Affymatrix Microarray output Message-ID: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Dear all, ? Recently, I focus on population genomics data outputed by affymatrix microarray system. However, softwares which designed by affy. inc only run in Windows 386 platform. Is there any application can used in Linux? Bio::Affymatrix was not strong enough to get the detailed informaton. ? Thank you a lot. ? Yours, WWZ ___________________________________________________________________ ? Wen-Zhi WANG State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel:??????(86) 871-5198993 Fax:???? (86) 871-5195430 Mobile: 13759114244 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From Russell.Smithies at agresearch.co.nz Tue Apr 7 21:58:54 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 8 Apr 2009 13:58:54 +1200 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABF94C@exchsth.agresearch.co.nz> Have you had a look at Microarray-GeneXplorer http://search.cpan.org/~sherlock/Microarray-GeneXplorer-0.11/ I haven't used it but I'd expect it to be pretty good being from Gavin Sherlock :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Wen-Zhi WANG > Sent: Wednesday, 8 April 2009 1:40 p.m. > To: BioPerl List > Subject: [Bioperl-l] Pasing Affymatrix Microarray output > > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run in > Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > > Thank you a lot. > > Yours, > WWZ > ___________________________________________________________________ > > Wen-Zhi WANG > > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel:??????(86) 871-5198993 > Fax:???? (86) 871-5195430 > Mobile: 13759114244 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sdavis2 at mail.nih.gov Tue Apr 7 22:10:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 7 Apr 2009 22:10:17 -0400 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run > in Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > You may want to look at a non-bioperl solution such as Bioconductor ( http://bioconductor.org). Sean From sac at bioperl.org Wed Apr 8 01:59:49 2009 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 7 Apr 2009 22:59:49 -0700 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> Message-ID: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Check out our Affymetrix Power Tools (APT) package: http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx We distribute binaries for Linux and Mac OSX, as well as source code so you can compile it yourself if you want. Note however that this is written in C++, not Perl. We don't provide SWIG or XS interfaces for direct access via Perl, though this would definitely be doable, if anyone is interested. Probably the easiest approach from Perl would be to simply call the appropriate APT executable through the shell as in: system("/path/to/apt --args ..."); The Perl code can parse the output files and take it from there. Steve On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis wrote: > On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > >> Dear all, >> >> Recently, I focus on population genomics data outputed by affymatrix >> microarray system. However, softwares which designed by affy. inc only run >> in Windows 386 platform. Is there any application can used in Linux? >> Bio::Affymatrix was not strong enough to get the detailed informaton. >> > > You may want to look at a non-bioperl solution such as Bioconductor ( > http://bioconductor.org). > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Wed Apr 8 10:07:17 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 8 Apr 2009 07:07:17 -0700 (PDT) Subject: [Bioperl-l] Access Uniprot detailed information Message-ID: <22951210.post@talk.nabble.com> Hi there, maybe I am not able to read careful enough through the Howto section. But is there a function in BioPerl that retrieves for a given Uniprot Access Code or ID from the Uniprot Database some general annotations like enzymatic activity or literature references? I appreciate any help! -- View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From johann.pellet at inserm.fr Wed Apr 8 11:29:29 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 8 Apr 2009 17:29:29 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Hie all, I confirm that now it's ok for the LOCUS S67862S3 since Chris update. Thanks again. However I still have Warning message with other entries like: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93MP8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93P8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- ######################################################################################################################### but entries are inserted in the biosql database: ######################################################################################################################### biosql=# select * from bioentry where description like 'Hantaanvirus CGRn93P8%'; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+----------+----------- +------------+---------- + ----------------------------------------------------------------------- +--------- 156282 | 84 | 395824 | EF990932 | EF990932 | 156144486 | VRL | Hantaanvirus CGRn93P8 RNA-dependent RNA polymerase gene, partial cds. | 1 156288 | 84 | 395824 | EF990918 | EF990918 | 154623008 | VRL | Hantaanvirus CGRn93P8 segment M, complete sequence. | 1 156294 | 84 | 395824 | EF990904 | EF990904 | 154622980 | VRL | Hantaanvirus CGRn93P8 segment S, complete sequence. | 1 (3 rows) ######################################################################################################################### and finally EU608407 and EU608559 made a crash: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Fowl adenovirus 8' (I was supplied 'Fowl adenovirus E | Aviadenovirus | Adenoviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- #######...14 times ...############ --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008","","","CRC- D35248959C54B9F2","1","1212","") FKs () ERROR: null value in column "location" violates not-null constraint --------------------------------------------------- Could not store EU608559: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children / Library/Perl/5.8.8/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:230 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:237 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 ######################################################################################################################### If I check in the biosql database if some part of this records are inserted: ######################################################################################################################### select * from reference where title='Evidence for positive epistasis in HIV-1'; reference_id | dbxref_id | location | title | authors | crc --------------+-----------+-------------------------------------- +------------------------------------------ + ----------------------------------------------------------------------------+ ---------------------- 16443 | 4179 | Science 306 (5701), 1547-1550 (2004) | Evidence for positive epistasis in HIV-1 | Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. | CRC-19E7AA4FB7A5D4AF (1 row) select * from dbxref where dbxref_id=4179; dbxref_id | dbname | accession | version -----------+--------+-----------+--------- 4179 | PUBMED | 15567861 | 0 select * from bioentry where accession=15567861; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+------+----------- +------------+----------+-------------+--------- (0 rows) ######################################################################################################################### I don't have records with name='EU608407' or 'EU608559' in the bioentry table. Thanks for your help Johann -- -- Johann Pellet Le 7 avr. 09 ? 19:56, Hilmar Lapp a ?crit : > Awesome, thanks Chris! $beer_owed++; > > -hilmar > > On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > >> Fixed in svn now and have added this as a test case (passes all >> tests in bioperl-live). For some reason this wasn't catching some >> more complex combinations of operators, mainly those with mixes of >> order/join. >> >> chris >> >> On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: >> >>> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >>> >>>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/ >>>>> 544772 >>>> >>>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>>> >>>>> Does anyone see why the location parser should have a problem >>>>> with the first >>>>> gene feature? It's nested, and has remote location components, >>>>> but at first >>>>> sight nothing jumps out at me as extraordinary. Has someone >>>>> recently changed >>>>> the location parsing code? If no-one has an immediate idea what >>>>> could be at >>>>> work here, this needs investigating. >>> >>> The location parsing code was refactored above 3-4 years ago w/o >>> problems. This'll be the first one to crop up. I'll try taking a >>> look at it. >>> >>>> I'm not sure if Bioperl handles the order() operator? >>>> >>>> For those unfamilair with the order() operator: >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>>> >>>> order(location,location, ... location) >>>> The elements can be found in the specified order (5' to 3' >>>> direction), >>>> but nothing is implied about the reasonableness about joining them. >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>> >>> It's interesting that the version from eutils differs >>> significantly in the feature table when retrieving 'gb' or >>> 'gbwithparts', the latter resolves the location (see below). >>> Regardless we'll need to make sure this is parseable. >>> >>> .... >>> >>> FEATURES Location/Qualifiers >>> source 1..77 >>> /organism="Ovine respiratory syncytial virus" >>> /mol_type="genomic RNA" >>> /db_xref="taxon:28869" >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> /gene="G" >>> gene 55..>77 >>> /gene="fusion glycoprotein F" >>> >>> >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cgoddard at flmnh.ufl.edu Wed Apr 8 11:25:37 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 11:25:37 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db Message-ID: <49DCC1F1.6080601@flmnh.ufl.edu> I am running into problems when trying to insert a sequence object retrieved from GenBank into a BioSQL schema running in a Postgres database. Whenever I use the 'create()' method on the sequence that has been made into a persistent object, the sequence isn't saved into the database properly. No error messages are given, and the corresponding Postgres primary key sequences are incremented as if the data had been saved properly: the appropriate tables themselves remain empty though. I am completely new to using the biosql-db modules, and so am probably missing something pretty simple. Below you will see the basic code that causes the problem. my $genbank_id = 'AYXXXXXX' my $genDB = new Bio::DB::GenBank; $sequence = $genDB->get_Seq_by_id($genbank_id); my $db = Bio::DB::BioDB->new(-database => 'biosql', -user => 'username', -dbname => 'dbname', -host => 'localhost', -driver => 'Pg'); my $pobj = $db->create_persistent($sequence); $pobj->create(); I am running the latest svn trunk versions of bioperl and bioperl-db (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI taxonomy info using the script included in the BioSQL package, and that data seemed to install without error. Any help or advice would be greatly appreciated. Thanks, Chris Goddard From hlapp at gmx.net Wed Apr 8 12:21:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:21:11 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCC1F1.6080601@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> Message-ID: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> This all sounds like you aren't issuing commit. Are you sure your code contains $popj->commit() and what you are looking at is *after* that is executed? Bioperl-db is transactional, so you decide when to commit (or rollback). -hilmar On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > I am running into problems when trying to insert a sequence object > retrieved from GenBank into a BioSQL schema running in a Postgres > database. Whenever I use the 'create()' method on the sequence that > has been made into a persistent object, the sequence isn't saved > into the database properly. No error messages are given, and the > corresponding Postgres primary key sequences are incremented as if > the data had been saved properly: the appropriate tables themselves > remain empty though. > > I am completely new to using the biosql-db modules, and so am > probably missing something pretty simple. Below you will see the > basic code that causes the problem. > > my $genbank_id = 'AYXXXXXX' > > my $genDB = new Bio::DB::GenBank; > $sequence = $genDB->get_Seq_by_id($genbank_id); > > my $db = Bio::DB::BioDB->new(-database => 'biosql', > -user => 'username', > -dbname => 'dbname', > -host => 'localhost', > -driver => 'Pg'); > > my $pobj = $db->create_persistent($sequence); > $pobj->create(); > > I am running the latest svn trunk versions of bioperl and bioperl-db > (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI > taxonomy info using the script included in the BioSQL package, and > that data seemed to install without error. Any help or advice would > be greatly appreciated. > > Thanks, > Chris Goddard > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 12:40:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:40:53 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCD120.8020302@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> <49DCD120.8020302@flmnh.ufl.edu> Message-ID: <4A6EA2F3-BA88-474E-A9D9-C1A7444CA755@gmx.net> On Apr 8, 2009, at 12:30 PM, Chris Goddard wrote: > That was it. I guess I just incorrectly assumed that create() did > an auto-commit. That was simple to fix. Thank you! > No problem, I'm glad I could be helpful! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cgoddard at flmnh.ufl.edu Wed Apr 8 12:30:24 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 12:30:24 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> Message-ID: <49DCD120.8020302@flmnh.ufl.edu> That was it. I guess I just incorrectly assumed that create() did an auto-commit. That was simple to fix. Thank you! Chris Hilmar Lapp wrote: > This all sounds like you aren't issuing commit. Are you sure your code > contains $popj->commit() and what you are looking at is *after* that > is executed? > > Bioperl-db is transactional, so you decide when to commit (or rollback). > > -hilmar > > On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > >> I am running into problems when trying to insert a sequence object >> retrieved from GenBank into a BioSQL schema running in a Postgres >> database. Whenever I use the 'create()' method on the sequence that >> has been made into a persistent object, the sequence isn't saved into >> the database properly. No error messages are given, and the >> corresponding Postgres primary key sequences are incremented as if >> the data had been saved properly: the appropriate tables themselves >> remain empty though. >> >> I am completely new to using the biosql-db modules, and so am >> probably missing something pretty simple. Below you will see the >> basic code that causes the problem. >> >> my $genbank_id = 'AYXXXXXX' >> >> my $genDB = new Bio::DB::GenBank; >> $sequence = $genDB->get_Seq_by_id($genbank_id); >> >> my $db = Bio::DB::BioDB->new(-database => 'biosql', >> -user => 'username', >> -dbname => 'dbname', >> -host => 'localhost', >> -driver => 'Pg'); >> >> my $pobj = $db->create_persistent($sequence); >> $pobj->create(); >> >> I am running the latest svn trunk versions of bioperl and bioperl-db >> (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI >> taxonomy info using the script included in the BioSQL package, and >> that data seemed to install without error. Any help or advice would >> be greatly appreciated. >> >> Thanks, >> Chris Goddard >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sanjay.harke at gmail.com Wed Apr 8 23:24:45 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Thu, 9 Apr 2009 08:54:45 +0530 Subject: [Bioperl-l] Help in basics of Bioperl Message-ID: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Dear friend, I need help in following problem.I am beginer in bioperl i have sequence data. i install perl-bioperl on my computer. Now i want analyse sequences with blast, tree and multiple sequence analysis. so kindly guide me from basic. sanjay From abhishek.vit at gmail.com Wed Apr 8 23:31:26 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 8 Apr 2009 23:31:26 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: Dear Sanjay As much as people on this love to help out. I would definitely put in some efforts to atleast go through the basic bio perl tutorial before asking this question. Atleast that would have helped you frame the question legitimately. I think we should put diligent effort before trying to take other people's help. Here is the link to bio perl tutorial please try to go through the relevant sections. I am sure you will get your answer there. http://www.bioperl.org/Core/Latest/bptutorial.html Thanks, -Abhi On Wed, Apr 8, 2009 at 11:24 PM, Sanjay Harke wrote: > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Apr 8 23:35:12 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:35:12 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > [...] > and finally EU608407 and EU608559 made a crash: > > [...] > --------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, > attempting to recover > --------------------------------------------------- > #######...14 times ...############ I would assume that you figured out that this was triggered by or affected EU608407? Would you mind sharing how? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., > Whitcomb,LOCUS EU608407 > 1212 bp DNA linear VRL 20-APR-2008","","","CRC- > D35248959C54B9F2","1","1212","") FKs () > ERROR: null value in column "location" violates not-null constraint Is this really the verbatim copy of the error message you saw on the screen? What's really puzzling about this is how the genbank SeqIO parser could mess up parsing the reference entry to badly. Here's the reference from the version online at NCBI: REFERENCE 1 (bases 1 to 1212) AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. TITLE Evidence for positive epistasis in HIV-1 JOURNAL Science 306 (5701), 1547-1550 (2004) PUBMED 15567861 How the first author line would be chopped off at the end and the LOCUS line would have gotten inserted there is a mystery to me. The location is "Science 306 (5701), 1547-1550 (2004)", and according to the error message the parser failed to extract that and the TITLE. Could you confirm that the file you are parsing is not corrupted in any way, specifically for this record? > --------------------------------------------------- > Could not store EU608559: > ------------- EXCEPTION: Bio::Root::Exception ------------- > [...] > > If I check in the biosql database if some part of this records are > inserted: So are there other sequences associated with that PubMed ID? Can you do a grep on the PubMed ID and see whether it occurs already before the one that trips up the load? > [...] > select * from dbxref where dbxref_id=4179; > dbxref_id | dbname | accession | version > -----------+--------+-----------+--------- > 4179 | PUBMED | 15567861 | 0 > > select * from bioentry where accession=15567861; Note that 15567861 is the accession (PubMed ID) for the referenced article, not the sequence. Which bioentries are associated with a reference would be in the bioentry_reference table. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 23:51:52 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:51:52 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > > On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > >> [...] >> and finally EU608407 and EU608559 made a crash: >> >> [...] >> --------------------- WARNING --------------------- >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to recover >> --------------------------------------------------- >> #######...14 times ...############ > > I would assume that you figured out that this was triggered by or > affected EU608407? Would you mind sharing how? Looking at EU608407, it most likely wasn't the culprit or stumbling stone. It must have been triggered before that. > [...] > So are there other sequences associated with that PubMed ID? To answer my own question, it's indeed EU608407 that's from the same PubMed ID, and so am I correct in assuming that you didn't get the exception for that record, which would mean that the reference was properly inserted when that sequence was loaded. The second occurrence of the same PubMed ID should have actually triggered a successful lookup of the previously inserted record, which would then have skipped the insert. The fact that that didn't happen suggests that the PubMed ID also wasn't properly extracted from the Genbank record. So my first suspicion remains that your file is corrupted. Otherwise, if you download this record: http://www.ncbi.nlm.nih.gov/nuccore/183191257 in GenBank format and try to load it alone, it should yield the same error. Can you indeed reproduce the problem in that way? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Apr 8 23:55:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Apr 2009 23:55:12 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: <4FAA64AA47534B98874AB16622D184BA@NewLife> Hi Sanjay, Judging from your posts to the list this month, I see you have an appreciation of the power of Bioperl to help you get all kinds of analysis jobs done, and that you have a real desire to learn a lot about it. I want to encourage that attitude. I also want to remind you that the absolutely best way to really understand anything is to dive into your project and try to understand the basics *on your own*. Your posts to this are honestly much too general for this list. People here are really generous with their time, but they don't have enough of it to walk you through every step. When I have an issue with my Bioperl programming (and believe me, I have had and do have many), I do at least three things before I consider posting on this list: * I read the documentation for the module I'm working with. * I go to the wiki (www.bioperl.org) and look for HOWTOs or tutorials. There is a search facility there, and many many MANY introductory links. * I go to the source code directly, and try to figure out what it is really doing. So, it turns out I rarely post questions to the list, because I've figured out my dumb mistake, or how to do that new thing. PLUS, I've become that much closer to true Bioperl independence. Please go to the page http://www.bioperl.org/wiki/Getting_Started and *read it*. Please follow the links. You may even find that your work has already been done for you. One hint that works here on the list and elsewhere is: the more work you can show you have done by yourself, the more willing an expert is to help you over the hard parts. Conversely, the less work you do, the greater the chance that your questions will go unheard. cheers, Mark ----- Original Message ----- From: "Sanjay Harke" To: Sent: Wednesday, April 08, 2009 11:24 PM Subject: [Bioperl-l] Help in basics of Bioperl > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johann.pellet at inserm.fr Thu Apr 9 05:48:43 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Thu, 9 Apr 2009 11:48:43 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> Message-ID: <2FDD67FF-5DBA-4987-A04D-231AF8B1E93B@inserm.fr> Hie Hilmar, I am very sorry, I checked my GenBank file, and you are right It's corrupted :-( grep EU608407 genbankFile AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008 ACCESSION EU608407 VERSION EU608407.1 GI:183190953 So I have downloaded EU608407 and I have loaded it alone with load_sequence.pl without problems. Same for EU608559. Thanks again Johann Le 9 avr. 09 ? 05:51, Hilmar Lapp a ?crit : > > On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > >> >> On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: >> >>> [...] >>> and finally EU608407 and EU608559 made a crash: >>> >>> [...] >>> --------------------- WARNING --------------------- >>> MSG: Unexpected error in feature table for Skipping feature, >>> attempting to recover >>> --------------------------------------------------- >>> #######...14 times ...############ >> >> I would assume that you figured out that this was triggered by or >> affected EU608407? Would you mind sharing how? > > Looking at EU608407, it most likely wasn't the culprit or stumbling > stone. It must have been triggered before that. > >> [...] >> So are there other sequences associated with that PubMed ID? > > To answer my own question, it's indeed EU608407 that's from the same > PubMed ID, and so am I correct in assuming that you didn't get the > exception for that record, which would mean that the reference was > properly inserted when that sequence was loaded. > > The second occurrence of the same PubMed ID should have actually > triggered a successful lookup of the previously inserted record, > which would then have skipped the insert. The fact that that didn't > happen suggests that the PubMed ID also wasn't properly extracted > from the Genbank record. So my first suspicion remains that your > file is corrupted. > > Otherwise, if you download this record: > http://www.ncbi.nlm.nih.gov/nuccore/183191257 > > in GenBank format and try to load it alone, it should yield the same > error. Can you indeed reproduce the problem in that way? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From montalen at moulon.inra.fr Thu Apr 9 06:49:22 2009 From: montalen at moulon.inra.fr (montalent) Date: Thu, 9 Apr 2009 12:49:22 +0200 Subject: [Bioperl-l] Bioperl add_object_condition Message-ID: <6D76CE64E5E744C7B571F3BA31670F9D@bioinfo2> Dear colleague, I try to use add_object_condition() function, to get a subset of sequences. I try this : # 1. STORE SELECTED BAC IN AN HASH TABLE : key = bac_name, value = sequence # 1.1 STORE SELECTED BAC NAME IN AN ARRAY my @selected_bac_list=(); open (SELECTION, $bac_selection_file) or die "can not open $bac_selection_file :$!\n"; while (my $line=){ my ($bac_name)=($line =~ /^(.+?);.+/); # print $bac_name."\n"; push @selected_bac_list, $bac_name; } # 1.2 READ FASTA FILE WITH BIOPERL TO STORE IN AN HASH TABLE my $bac_fasta= Bio::SeqIO->new(-file=>$maize_sequence_bac_file, '-format'=>"Fasta"); my $builder = $bac_fasta->sequence_builder(); if ($builder->add_object_condition(sub { print " check \n"; my $seq_ref=shift; if ($ref_seq->{'-length'} > 5000;){ return 0;} else {return 1;} })){ print "add_object_condition renvoie true\n";} else{ print "add_object_condition renvoie false\n";} # for each sequence in fasta file, check if it is a selected bac while(my $seq=$bac_fasta->next_seq()){ print $seq->id."\n"; # PB : IT PRINTS ALL THE SEQUENCE NOT THE SUBSET.... } I can't get the sequences subset but all the sequences. So I make a print() in the closure of add_object_condition, but nothing is printed. It seems like it does not execute the sub in add_object_condition(), but add_object_conditions return true value. I try to use add_object_condition who seems to be a powerfull method, but I do not succeed. May I ask you some advice how to use add_object_condition() ? Do I forget something ? Best regards Pierre Montalent INRA - Ferme du moulon France From jarodpardon at yahoo.com.cn Thu Apr 9 20:27:29 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Fri, 10 Apr 2009 08:27:29 +0800 (CST) Subject: [Bioperl-l] bioperl translate() function for seq obj Message-ID: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Hi, all, I want to know whether Bio::PrimarySeqI::translate() uses identical method and codon table with NCBI Blast/blastx does. Thanks. Jarod ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From csembry at ualr.edu Thu Apr 9 20:54:21 2009 From: csembry at ualr.edu (Charles Embry) Date: Thu, 09 Apr 2009 19:54:21 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 Message-ID: Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. I get this error when i run the make file. "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL Writing Makefile for Bio::Ext::Align ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. Must be of the form '#.##'. (For instance '1.23') ?at ./Makefile.PL line 4" This is the first? 11 lines of the Makefile.PL for ext package use Inline::MakeMaker; use Config; WriteInlineMakefile( ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, ???????????????????????? 'Bio::SeqIO::abi' => 0.0, ?????????????????????? }, # e.g., Module::Name => 1.1, ??????????? test??????????????? => { TESTS => 'test.pl' }, ?????????? ); What does the error mean? And what version does it refer to? Of what? (staden?) What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? Thanks you C. Stephen Embry From maj at fortinbras.us Thu Apr 9 21:16:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Apr 2009 21:16:18 -0400 Subject: [Bioperl-l] bioperl translate() function for seq obj In-Reply-To: <221543.32779.qm@web15003.mail.cnb.yahoo.com> References: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Message-ID: Hi Jarod- translate() uses NCBI "Standard" table by default. Check out the POD for PrimarySeqI.pm (where translate is defined). You can specify others by setting -CODONTABLE_ID => $n as an argument to translate(). The codon tables are in Bio::Tools::CodonTable, where the following are defined: @NAMES = #id ( 'Standard', #1 'Vertebrate Mitochondrial',#2 'Yeast Mitochondrial',# 3 'Mold, Protozoan, and CoelenterateMitochondrial and Mycoplasma/Spiroplasma',#4 'Invertebrate Mitochondrial',#5 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6 '', '', 'Echinoderm Mitochondrial',#9 'Euplotid Nuclear',#10 '"Bacterial"',# 11 'Alternative Yeast Nuclear',# 12 'Ascidian Mitochondrial',# 13 'Flatworm Mitochondrial',# 14 'Blepharisma Nuclear',# 15 'Chlorophycean Mitochondrial',# 16 '', '', '', '', 'Trematode Mitochondrial',# 21 'Scenedesmus obliquus Mitochondrial', #22 'Thraustochytrium Mitochondrial' #23 ); Can others (Scott M?) chime in on blast? Mark ----- Original Message ----- From: "?? ??" To: "'bioperl-l'" Sent: Thursday, April 09, 2009 8:27 PM Subject: [Bioperl-l] bioperl translate() function for seq obj > > > Hi, all, > I want to know whether Bio::PrimarySeqI::translate() uses identical method and > codon table with NCBI Blast/blastx does. Thanks. > > Jarod > > > ___________________________________________________________ > ?????????????????????????????????? > http://card.mail.cn.yahoo.com/ > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rrfreimuth2 at yahoo.com Thu Apr 9 22:10:21 2009 From: rrfreimuth2 at yahoo.com (Robert Freimuth) Date: Thu, 9 Apr 2009 19:10:21 -0700 (PDT) Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code Message-ID: <38796.60680.qm@web65611.mail.ac4.yahoo.com> The Perl Foundation is looking for mentors for several projects for Google's Summer of Code.? Two of the projects are directly applicable to bioperl. In particular they're looking for mentors for these projects: Bio::Restriction::* - Improve reading and writing of RE collection in different formats; add support for multicut/multisite enzymes.A bioperl parser module for repeats/transposons."CPAN OS Installer", integrate CPAN packages into Unix package managers like rpm and apt/dpkgCross-platform Perl Bindings for wxWebKit If you're interested please see the full announcement, posted on PerlMonks:? http://www.perlmonks.org/?node_id=755872. Thanks, Bob From j_martin at lbl.gov Thu Apr 9 23:18:28 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 9 Apr 2009 20:18:28 -0700 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: References: Message-ID: <20090410031827.GE6535@eniac.jgi-psf.org> Hello, I found that 1.5.1 a pain to install, I recommend the code from http://www.bioperl.org/wiki/Ext_package#The_latest_code anywho, the read is read.pm, the message is something from inline::c I think, there's an old bug report about it, if you can't use the newer code maybe it will help. http://bugzilla.open-bio.org/show_bug.cgi?id=2074 joel On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: > Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. > I get this error when i run the make file. > > "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. > Must be of the form '#.##'. (For instance '1.23') > ?at ./Makefile.PL line 4" > > This is the first? 11 lines of the Makefile.PL for ext package > > use Inline::MakeMaker; > use Config; > > WriteInlineMakefile( > ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', > ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, > ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, > ???????????????????????? 'Bio::SeqIO::abi' => 0.0, > ?????????????????????? }, # e.g., Module::Name => 1.1, > ??????????? test??????????????? => { TESTS => 'test.pl' }, > ?????????? ); > > What does the error mean? > > And what version does it refer to? Of what? (staden?) > What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? > > > Thanks you > C. Stephen Embry > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hsa_rim at yahoo.co.in Thu Apr 9 23:43:53 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Fri, 10 Apr 2009 09:13:53 +0530 (IST) Subject: [Bioperl-l] Creating Cytoband Ideogram images Message-ID: <824645.66937.qm@web94611.mail.in2.yahoo.com> Hi, I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. Thanks in advance Shafeeq Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ From hlapp at gmx.net Fri Apr 10 00:00:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 10 Apr 2009 00:00:54 -0400 Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code In-Reply-To: <38796.60680.qm@web65611.mail.ac4.yahoo.com> References: <38796.60680.qm@web65611.mail.ac4.yahoo.com> Message-ID: <0C80FD8F-78F6-493E-94C3-AE5D845577C5@gmx.net> Hi Robert - thanks for putting us into the loop! On Apr 9, 2009, at 10:10 PM, Robert Freimuth wrote: > The Perl Foundation is looking for mentors for several projects for > Google's Summer of Code. Two of the projects are directly applicable > to bioperl. > > In particular they're looking for mentors for these projects: > > Bio::Restriction::* - Improve reading and writing of RE collection in > different formats; add support for multicut/multisite enzymes.A > bioperl parser module for repeats/transposons. I don't want to dampen any enthusiasm and the project may indeed be worthwhile, but it's also worth noting that we haven't ever seen the student applicant here (assuming it's the same who contacted Heikki a while ago). Having said that, the fact that there hasn't been any community interaction from the student yet obviously doesn't have to mean that there can't be any in the future. But in the Google Summer of Code spirit of recruiting new contributors into FLOSS communities, it's a less than ideal start. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Apr 10 00:15:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:15:45 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: <20090410031827.GE6535@eniac.jgi-psf.org> References: <20090410031827.GE6535@eniac.jgi-psf.org> Message-ID: <327D2C1C-A61A-473A-B85D-7A249856CC85@illinois.edu> Just to note, we're not actively supporting much of the bioperl-ext code, in favor of the BioLib initiative: http://biolib.open-bio.org/wiki/Main_Page If you do use bioperl-ext I suggest only using the latest code from svn (and that in combination with bioperl-live). chris On Apr 9, 2009, at 10:18 PM, Joel Martin wrote: > Hello, > I found that 1.5.1 a pain to install, I recommend the code from > > http://www.bioperl.org/wiki/Ext_package#The_latest_code > > anywho, the read is read.pm, the message is something from > inline::c I think, there's an old bug report about it, if > you can't use the newer code maybe it will help. > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > joel > > > On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: >> Hello I am a graduate student at UALR and I am trying to install >> the ext package(1.5.1) on bioperl 1.5.1. >> I get this error when i run the make file. >> >> "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL >> Writing Makefile for Bio::Ext::Align >> ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/ >> Bio/SeqIO/staden/Makefile.PL: Invalid version '' for >> Bio::SeqIO::staden::read. >> Must be of the form '#.##'. (For instance '1.23') >> at ./Makefile.PL line 4" >> >> This is the first 11 lines of the Makefile.PL for ext package >> >> use Inline::MakeMaker; >> use Config; >> >> WriteInlineMakefile( >> 'NAME' => 'Bio::SeqIO::staden::read', >> 'VERSION_FROM' => './read.pm', # finds $VERSION, >> 'PREREQ_PM' => { 'Inline::C' => 0.0, >> 'Bio::SeqIO::abi' => 0.0, >> }, # e.g., Module::Name => 1.1, >> test => { TESTS => 'test.pl' }, >> ); >> >> What does the error mean? >> >> And what version does it refer to? Of what? (staden?) >> What version of Staden should this be if i am using the >> io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? >> >> >> Thanks you >> C. Stephen Embry >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 10 00:32:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:32:59 -0500 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Message-ID: <0340305E-EAB3-4A08-9B41-5E706F4A5A16@illinois.edu> Would definitely be worth testing out interactivity with these. chris On Apr 8, 2009, at 12:59 AM, Steve Chervitz wrote: > Check out our Affymetrix Power Tools (APT) package: > > http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx > > We distribute binaries for Linux and Mac OSX, as well as source code > so you can compile it yourself if you want. Note however that this is > written in C++, not Perl. We don't provide SWIG or XS interfaces for > direct access via Perl, though this would definitely be doable, if > anyone is interested. > > Probably the easiest approach from Perl would be to simply call the > appropriate APT executable through the shell as in: > > system("/path/to/apt --args ..."); > > The Perl code can parse the output files and take it from there. > > Steve > > > On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis > wrote: >> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG > >wrote: >> >>> Dear all, >>> >>> Recently, I focus on population genomics data outputed by affymatrix >>> microarray system. However, softwares which designed by affy. inc >>> only run >>> in Windows 386 platform. Is there any application can used in Linux? >>> Bio::Affymatrix was not strong enough to get the detailed >>> informaton. >>> >> >> You may want to look at a non-bioperl solution such as Bioconductor ( >> http://bioconductor.org). >> >> Sean >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 1 17:56:36 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 23:56:36 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = $taxDB? >get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB?>get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB?>get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB?>get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each blast result #Extract the GI my $superkingdom = $taxDB- >get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem with it don't hesitate to contact me. Hope this helps, M; -------------- next part -------------- A non-text attachment was scrubbed... Name: tax2bin2.pl Type: text/x-perl-script Size: 400 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: taxbuild.pm Type: text/x-perl-script Size: 10599 bytes Desc: not available URL: -------------- next part -------------- El 01/04/2009, a las 19:03, Florent Angly escribi?: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that > you won't be able to put its information in a hash (unless you have > a lot of memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you >> created custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >> . >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the >> nodes.dmp file contained within lists what division each tax_id >> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >> so you can probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 >> 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to >>> parse a blast >>> report (refseq blast) and check how may organisms are eukaryotes >>> and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the >>> example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Fri Apr 10 00:34:03 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Apr 2009 23:34:03 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: Estelle, Always direct your questions to the bioperl mail list (I'm cc'ing them now). I'm not sure about using that option, maybe someone else can answer? chris On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > Dear Mr Fields, > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > to run > blastpgp. > However, the -C option (save a checkpoint in ASN.x) seems not > available in > this module (options are -j, -h, -c, -B, and -Q). Is there another > way to > save the checkpoint? > > I thank you by advance (and apologize for my English). > > Estelle From jaleto at gmail.com Fri Apr 10 03:50:46 2009 From: jaleto at gmail.com (Jonathan Leto) Date: Fri, 10 Apr 2009 00:50:46 -0700 Subject: [Bioperl-l] Google Summer of Code 2009 BioPerl Student Applications Message-ID: <9aaadf9c0904100050g7f82f925s2e9bae9646da6cd5@mail.gmail.com> Howdy, There are two student applications for The Perl Foundation this year which are BioPerl-related, and I would very much like for them to succeed, but most of the current mentors do not have the background to judge whether they are possible in the time given, or what most of words mean for that matter. We really need some feedback from BioPerl people as to the viability of this applications, as well as comments and suggestions for implementation issues. Please sign up at the GSoC web app [1], then apply to be a mentor for The Perl Foundation. It requires me to manually accept you and then you will be able to view the 19 applications we received this year. Please also join the private mentor list [2] and the students+mentors list [3] if you would like to keep up to date and get involved. Welcome! Cheers, [1] http://socghop.appspot.com/ [2] http://groups.google.com/group/tpf-gsoc [3] http://groups.google.com/group/tpf-gsoc-students -- [---------------------] Jonathan Leto jaleto at gmail.com From scott at scottcain.net Fri Apr 10 09:08:53 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 10 Apr 2009 09:08:53 -0400 Subject: [Bioperl-l] Creating Cytoband Ideogram images In-Reply-To: <824645.66937.qm@web94611.mail.in2.yahoo.com> References: <824645.66937.qm@web94611.mail.in2.yahoo.com> Message-ID: <536f21b00904100608w23484c5bi3765da39b6b4d946@mail.gmail.com> Hello Shafeeq, You need Bio::Graphics::Glyph::ideogram, which is part of Bio::Graphics. You can install it from cpan and it will install BioPerl 1.6 as a prereq. The perldoc for ideogram.pm has example code and data, since the format of the data is important. Scott On Thu, Apr 9, 2009 at 11:43 PM, shafeeq rim wrote: > Hi, > > I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. > > Thanks in advance > Shafeeq > > > > ? ? ?Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Fri Apr 10 09:32:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 10 Apr 2009 08:32:00 -0500 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: I don't know if this has been pointed out, but Bio::DB::Taxonomy is also capable of indexing and using the NCBI tax flat files. use Bio::DB::Taxonomy; my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namefile); # use other Bio::DB::Taxonomy methods chris On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > You may find the attached Perl module useful. It solves the > difficult parts of getting the taxonomy given a GI identifier or a > taxID. It is designed to be able to process a high number of GIs > very fast and with low memory usage. > > An example of usage would be: > > use taxbuild; > #Build the taxonomyDB > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > # Get the taxonomy given a GI identifier > my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); > > # Get the taxonomy term of a GI identifier at a given level > my $term_at_level = $taxDB? > >get_term_at_level_from_gi("35961124","family"); > > # Get the taxid of a GI identifier > my $taxid = $taxDB?>get_taxid("35961124"); > > # Get the taxonomy given a taxid > my @tax = $taxDB?>get_taxonomy($taxid); > > # Get the taxonomy at a given level given a taxid > my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); > > # Get the level of a given taxonomical name > my $level = $taxDB?>get_level_from_name("Proteobacteria"); > > The "dict file" is a processed version of the gi_taxid file from > taxonomyDB. You can get this file by running the tax2bin2.pl script > also attached: > > $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin > or, if you are working with genes instead of proteins: > $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin > > A possible solution to the original post using this module would be > something like: > > # Initialize the taxonomyDB once. > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > #For each blast result > #Extract the GI > my $superkingdom = $taxDB- > >get_term_at_level_from_gi($gi,"superkingdom"); > if ($superkingdom eq "Bacteria") { > # Do whatever you want > } elsif ($superkingdom eq "Eukaryota") > # Do whatever you want > } > > > The module has been tested mainly in Linux systems, but should run > without problems in Windows and Mac too. If you encounter any > problem with it don't hesitate to contact me. > > Hope this helps, > > M; > > > > > > El 01/04/2009, a las 19:03, Florent Angly escribi?: > >> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >> you won't be able to put its information in a hash (unless you have >> a lot of memory). >> Florent >> >> Smithies, Russell wrote: >>> The taxonomy information isn't in the blast output unless you >>> created custom fasta headers for your blast database. >>> The easiest way to get the tax_id for your accessions would be to >>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>> . >>> If you load that file into a hash, parse the accessions out of the >>> blast hits then lookup the tax_id from that hash, I think it >>> should be fairly fast. >>> Checking which are prokaryotes and which are eukaryotes based on >>> tax_id is a separate problem :-) >>> If you grab the taxdump.tar.gz file from the same site, the >>> nodes.dmp file contained within lists what division each tax_id >>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >>> so you can probably work it out from that. >>> >>> It's not a very BioPerly solution but sometimes just looking up >>> the answer from a file/table/hash is the simplest way. >>> Hope this helps, >>> >>> Russell Smithies >>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>> To: bioperl-l >>>> Subject: [Bioperl-l] taxonomy ID >>>> >>>> Hi All, >>>> I am writing a script, for one of its part i have to >>>> parse a blast >>>> report (refseq blast) and check how may organisms are eukaryotes >>>> and how >>>> namy of them are prokaryotes. >>>> I am using BIO::DB::taxinomy module: >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> But for this i need a taxonomyid (like '33090') given in the >>>> example. >>>> So is it possible to get a taxonomyid from refseq balst report? >>>> If not then how i can deal with this problem? >>>> >>>> i would really appreciate if anyone can help me out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Fri Apr 10 09:42:15 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 10 Apr 2009 09:42:15 -0400 Subject: [Bioperl-l] Query about Bioperl and Mysql In-Reply-To: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> References: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> Message-ID: <264855a00904100642l482deebend6be66b140933c2c@mail.gmail.com> On Sat, Mar 28, 2009 at 8:41 AM, Sanjay Harke wrote: > Dear friends, > > anybody nows about my following problem. > > !) I want to use my own database mysql with Bioperl > > kindly guide for it. > You'll want to look at the perl DBI and DBD::mysql modules. Sean From bosborne11 at verizon.net Fri Apr 10 09:55:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 09:55:00 -0400 Subject: [Bioperl-l] Access Uniprot detailed information In-Reply-To: <22951210.post@talk.nabble.com> References: <22951210.post@talk.nabble.com> Message-ID: <4C3C5234-31F7-4EEF-BBA0-9B912D21F210@verizon.net> Markus, There is some discussion of the structure of "swiss" format files in the Feature-Annotation HOW TO. Have you taken a look at this? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Other_Sequence_File_Formats This section does not explain all the fields in each different format, but it shows you code that you can run that will print out all the annotations and features. You're really asking 2 questions, I think. Have you figured out how to retrieve a sequence? See if this helps you: http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Brian O. On Apr 8, 2009, at 10:07 AM, manni122 wrote: > > Hi there, > maybe I am not able to read careful enough through the Howto section. > But is there a function in BioPerl that retrieves for a given > Uniprot Access > Code or ID from the Uniprot Database some general annotations like > enzymatic > activity or literature references? > I appreciate any help! > -- > View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Apr 10 10:05:06 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 10:05:06 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: <22816585.post@talk.nabble.com> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <22816585.post@talk.nabble.com> Message-ID: Dereje, There's a HOW TO that discusses an approach similar to this (Using local Genbank and Entrez Gene files): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences But the provided script uses Gene ids, not chromosome names. The more general suggestion would be to look at the module Bio::DB::Fasta. Brian O. On Mar 31, 2009, at 6:59 PM, demis001 wrote: > > Hi , > > I am new to BioPerl and this forum and even do not know how to post > the new > post. I have one question for you guys. > > Is there any BioPerl module that allows me to download sequence > based on > chromosome name, seqStart and SeqEnd given the formatted human genome > database downloaded on my Linux desktop? > > I used to do this using Perl $URI object and it is really slow as the > process depend on the network. To be more specific, I took chrName, > seqStart > and seqEnd and go to Ensembl database to get the sequence one by one > using > Perl $URI object. > > I thought it might be easier if I process locally using indexed > database > using BioPerl module if there is any designed for this purpose. > > Input, millions rows of tab delimited (CSV) file contain > information about > chrName, seqStart, seqEnd. Locally formatted/indexed human genome. > Output > should be the fasta sequence contain the sequence and with the header > contain chr name and location persed > > Sorry if I posted in the wrong section of the forum and happy to > get any > recommendation. > Thanks > > Govind Chandra wrote: >> >> Hi, >> >> The code below >> >> >> ====== code begins ======= >> #use strict; >> use Bio::SeqIO; >> >> $infile='NC_000913.gbk'; >> my $seqio=Bio::SeqIO->new(-file => $infile); >> my $seqobj=$seqio->next_seq(); >> my @features=$seqobj->all_SeqFeatures(); >> my $count=0; >> foreach my $feature (@features) { >> unless($feature->primary_tag() eq 'CDS') {next;} >> print($feature->start()," ", $feature->end(), " >> ",$feature->strand(),"\n"); >> $ac=$feature->annotation(); >> $temp1=$ac->get_Annotations("locus_tag"); >> @temp2=$ac->get_Annotations(); >> print("$temp1 $temp2[0] @temp2\n"); >> if($count++ > 5) {last;} >> } >> >> print(ref($ac),"\n"); >> exit; >> >> ======= code ends ======== >> >> produces the output >> >> ========== output begins ======== >> >> 190 255 1 >> 0 >> 337 2799 1 >> 0 >> 2801 3733 1 >> 0 >> 3734 5020 1 >> 0 >> 5234 5530 1 >> 0 >> 5683 6459 -1 >> 0 >> 6529 7959 -1 >> 0 >> Bio::Annotation::Collection >> >> =========== output ends ========== >> >> $ac is-a Bio::Annotation::Collection but does not actually contain >> any >> annotation from the feature. Is this how it should be? I cannot >> figure >> out what is wrong with the script. Earlier I used to use has_tag(), >> get_tag_values() etc. but the documentation says these are >> deprecated. >> >> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >> uname >> -a is >> >> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> x86_64 x86_64 x86_64 GNU/Linux >> >> Thanks in advance for any help. >> >> Govind >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Apr 10 11:51:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 10 Apr 2009 08:51:45 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <6B951DED-0632-451C-86A4-2A215B1CAE6C@bioperl.org> The only difference to the DB::Taxonomy module I can see is we don't specifically have the dictionary part -- for gi -> taxid, but I just do a local DBHash index of that when I need it. -jason On Apr 10, 2009, at 6:32 AM, Chris Fields wrote: > I don't know if this has been pointed out, but Bio::DB::Taxonomy is > also capable of indexing and using the NCBI tax flat files. > > use Bio::DB::Taxonomy; > > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namefile); > > # use other Bio::DB::Taxonomy methods > > chris > > On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > >> You may find the attached Perl module useful. It solves the >> difficult parts of getting the taxonomy given a GI identifier or a >> taxID. It is designed to be able to process a high number of GIs >> very fast and with low memory usage. >> >> An example of usage would be: >> >> use taxbuild; >> #Build the taxonomyDB >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> # Get the taxonomy given a GI identifier >> my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); >> >> # Get the taxonomy term of a GI identifier at a given level >> my $term_at_level = $taxDB? >> >get_term_at_level_from_gi("35961124","family"); >> >> # Get the taxid of a GI identifier >> my $taxid = $taxDB?>get_taxid("35961124"); >> >> # Get the taxonomy given a taxid >> my @tax = $taxDB?>get_taxonomy($taxid); >> >> # Get the taxonomy at a given level given a taxid >> my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); >> >> # Get the level of a given taxonomical name >> my $level = $taxDB?>get_level_from_name("Proteobacteria"); >> >> The "dict file" is a processed version of the gi_taxid file from >> taxonomyDB. You can get this file by running the tax2bin2.pl script >> also attached: >> >> $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin >> or, if you are working with genes instead of proteins: >> $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin >> >> A possible solution to the original post using this module would be >> something like: >> >> # Initialize the taxonomyDB once. >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> #For each blast result >> #Extract the GI >> my $superkingdom = $taxDB- >> >get_term_at_level_from_gi($gi,"superkingdom"); >> if ($superkingdom eq "Bacteria") { >> # Do whatever you want >> } elsif ($superkingdom eq "Eukaryota") >> # Do whatever you want >> } >> >> >> The module has been tested mainly in Linux systems, but should run >> without problems in Windows and Mac too. If you encounter any >> problem with it don't hesitate to contact me. >> >> Hope this helps, >> >> M; >> >> >> >> >> >> El 01/04/2009, a las 19:03, Florent Angly escribi?: >> >>> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >>> you won't be able to put its information in a hash (unless you >>> have a lot of memory). >>> Florent >>> >>> Smithies, Russell wrote: >>>> The taxonomy information isn't in the blast output unless you >>>> created custom fasta headers for your blast database. >>>> The easiest way to get the tax_id for your accessions would be to >>>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>>> . >>>> If you load that file into a hash, parse the accessions out of >>>> the blast hits then lookup the tax_id from that hash, I think it >>>> should be fairly fast. >>>> Checking which are prokaryotes and which are eukaryotes based on >>>> tax_id is a separate problem :-) >>>> If you grab the taxdump.tar.gz file from the same site, the >>>> nodes.dmp file contained within lists what division each tax_id >>>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, >>>> etc) so you can probably work it out from that. >>>> >>>> It's not a very BioPerly solution but sometimes just looking up >>>> the answer from a file/table/hash is the simplest way. >>>> Hope this helps, >>>> >>>> Russell Smithies >>>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>>> To: bioperl-l >>>>> Subject: [Bioperl-l] taxonomy ID >>>>> >>>>> Hi All, >>>>> I am writing a script, for one of its part i have to >>>>> parse a blast >>>>> report (refseq blast) and check how may organisms are eukaryotes >>>>> and how >>>>> namy of them are prokaryotes. >>>>> I am using BIO::DB::taxinomy module: >>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>>> >>>>> But for this i need a taxonomyid (like '33090') given in the >>>>> example. >>>>> So is it possible to get a taxonomyid from refseq balst report? >>>>> If not then how i can deal with this problem? >>>>> >>>>> i would really appreciate if anyone can help me out. >>>>> >>>>> Thanks >>>>> Shalabh >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From SMarkel at accelrys.com Fri Apr 10 12:01:25 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 10 Apr 2009 12:01:25 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729E04A77@exch1-hi.accelrys.net> Estelle, Are you using the most recent version of Bio::Tools::Run::StandAloneNCBIBlast? The available blastpgp parameters are our @BLASTPGP_PARAMS = qw(A B C E F G H I J K L M N O P Q R S T U W X Y Z a b c e f h j k l m q s t u v y z); See line 94. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, 09 April 2009 9:34 PM > To: Estelle Proux > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp > > Estelle, > > Always direct your questions to the bioperl mail list (I'm cc'ing them > now). I'm not sure about using that option, maybe someone else can > answer? > > chris > > On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > > > Dear Mr Fields, > > > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > > to run > > blastpgp. > > However, the -C option (save a checkpoint in ASN.x) seems not > > available in > > this module (options are -j, -h, -c, -B, and -Q). Is there another > > way to > > save the checkpoint? > > > > I thank you by advance (and apologize for my English). > > > > Estelle > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sat Apr 11 09:50:20 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sat, 11 Apr 2009 21:50:20 +0800 (CST) Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Hi, all, I use Bio::SeqIO driver to parse data files. The input data is somewhat buggy, and some of entries are not strict in format. The parser will throw exceptions and halt when meeting these bad entries. However, I want to just skip these entries, not stop there. So how to suppress exceptions? Thanks. Jarod ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From maj at fortinbras.us Sat Apr 11 11:32:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 11 Apr 2009 11:32:16 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: missed the list. ----- Original Message ----- From: "Mark A. Jensen" To: "?? ??" Sent: Saturday, April 11, 2009 10:52 AM Subject: Re: [Bioperl-l] how to suppress Bioperl exceptions > Hey Jarod- > You can try setting the verbosity of the object negative, as > > $seqio->verbose(-1); > > I've found, though, that the warning messages still come through > sometimes. I've gotten control of these using the Error package: > > use Error qw(:try); > > try { > $seqio = Bio::SeqIO->new(-file='my.fas'); > } > catch Error with { > my $e = shift; > # $e->test will contain the message > }; > > Note the lack of ; after the try block, and the > presence thereof after the catch block. > > cheers -Mark > ----- Original Message ----- > From: "?? ??" > To: > Sent: Saturday, April 11, 2009 9:50 AM > Subject: [Bioperl-l] how to suppress Bioperl exceptions > > >> >> Hi, all, >> I use Bio::SeqIO driver to parse data files. The input data is somewhat >> buggy, and some of entries are not strict in format. The parser will throw >> exceptions and halt when meeting these bad entries. However, I want to just >> skip these entries, not stop there. So how to suppress exceptions? >> Thanks. >> >> Jarod >> >> >> >> ___________________________________________________________ >> ?????????????????????????????????? >> http://card.mail.cn.yahoo.com/ >> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Sat Apr 11 11:56:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Apr 2009 11:56:35 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions In-Reply-To: <936515.8386.qm@web15007.mail.cnb.yahoo.com> References: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Message-ID: Hi Jarod, in addition to Mark's response, what you say in your message would mean that corruption is in specific entries of a file and you want to skip those, rather than entire files. If this is true, then you'd have to put the $seq=$seqio->next_seq() call into the try {} block as that'd be the one that would raise the exception. The SeqIO parsers don't generally guarantee though that they will gracefully recover from a parsing error and advance to the next record; I think the genbank parser will do that, but you will definitely want to check that. -hilmar On Apr 11, 2009, at 9:50 AM, ?? ?? wrote: > > Hi, all, > I use Bio::SeqIO driver to parse data files. The input data is > somewhat buggy, and some of entries are not strict in format. The > parser will throw exceptions and halt when meeting these bad > entries. However, I want to just skip these entries, not stop there. > So how to suppress exceptions? > Thanks. > > Jarod > > > > ___________________________________________________________ > ?????????????????????????????????? > http://card.mail.cn.yahoo.com/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From oleksii.nikolaienko at gmail.com Sun Apr 12 07:10:47 2009 From: oleksii.nikolaienko at gmail.com (Oleksii Nikolaienko) Date: Sun, 12 Apr 2009 14:10:47 +0300 Subject: [Bioperl-l] GSoC proposal Message-ID: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Hi all! My name is Oleksii, I`m PhD student and I`d like to receive your comments on my proposal for Google summer of code. It`s called "bioperl-live::Bio::Restriction::* - implementing missing features" and I`m going to: 1) add support for reading and writing different file formats for module Bio::Restriction::IO 2) add support for multicut/multisite enzymes 3) add information on recommended buffers, restriction efficiency, sensitivity to methylation, etc and corresponding new methods 4) update documentation for Bio::Restriction::* modules Thanks in advance for your suggestions. notch From roy.chaudhuri at gmail.com Tue Apr 14 10:54:21 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 14 Apr 2009 15:54:21 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error Message-ID: <49E4A39D.2020909@gmail.com> Hi Mike. I did get that problem solved in the end, thanks to lots of help from Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped cc-ing the mailing list at some point. The last archived message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had the correct solution - the code change was incorporated into the bioperl-ext CVS, and is in the latest version that you can get from SVN (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the problem you must be experiencing a different issue. You should also bear in mind the message Chris Fields sent to the list a few days ago, and have a look at using BioLib instead: > Just to note, we're not actively supporting much of the bioperl-ext > code, in favor of the BioLib initiative: > > http://biolib.open-bio.org/wiki/Main_Page > > If you do use bioperl-ext I suggest only using the latest code from > svn (and that in combination with bioperl-live). > > chris Hope this helps. Roy. Michael Stubbington wrote: > Dear Dr. Chaudhuri, > > I am currently trying to write a bioperl script that parses .abi > sequence files. I am having exactly the same problem as you did when > you posted this enquiry to the bioperl mailing list > http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was > wondering if you ever solved the problem and, if so, can you remember > what you did? I?d be very grateful for any help you can provide. I > can?t find this problem mentioned anywhere else online. > > Thank you for your time. > > > > Mike -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From cjfields at illinois.edu Tue Apr 14 11:20:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 10:20:00 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <49E4A39D.2020909@gmail.com> References: <49E4A39D.2020909@gmail.com> Message-ID: For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I?d be very grateful for any help you can provide. I >> can?t find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 14 14:21:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 13:21:43 -0500 Subject: [Bioperl-l] GSoC proposal In-Reply-To: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> References: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Message-ID: On Apr 12, 2009, at 6:10 AM, Oleksii Nikolaienko wrote: > Hi all! > My name is Oleksii, I`m PhD student and I`d like to receive your > comments on > my proposal for Google summer of code. It`s called > "bioperl-live::Bio::Restriction::* - implementing missing features" > and I`m > going to: > > 1) add support for reading and writing different file formats for > module Bio::Restriction::IO You should specify which formats you intend on working with. It's known that several formats don't carry all data, for instance prototype information, vendors, etc. so that should be documented for end-users. You should probably suggest a workaround for getting at missing data (i.e. a format that carries all info, retrieving prototype data separately, etc). > 2) add support for multicut/multisite enzymes Agreed, though you should be more specific on how you intend to implement this. From the Bio::Restriction::Enzyme documentation the sequence site is supposed to be a Bio::PrimarySeq (though I would probably change that internally so it creates these on the fly from the stored string). Multicut/multisite implies list context return, so it may become an API issue (and using wantarray as a workaround is fraught with problematic API traps that I suggest avoiding if at all possible). > 3) add information on recommended buffers, restriction > efficiency, > sensitivity to methylation, etc and corresponding new methods Much of this should probably be outlined in the corresponding interface class prior to implementation. > 4) update documentation for Bio::Restriction::* modules Yes, completely agree. This should be bumped closer to the top of the priority list (and outlined in the interface classes). > Thanks in advance for your suggestions. > > notch > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l chris From j_martin at lbl.gov Wed Apr 15 02:50:37 2009 From: j_martin at lbl.gov (Joel Martin) Date: Tue, 14 Apr 2009 23:50:37 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <20090415065037.GB1175@eniac.jgi-psf.org> Hello, Do you know where it says io_lib will stop supporting ABI? We use the latest ( 1.11.6 ) for this so I know it does read them and I just checked with one fresh off a sequencer. But I couldn't find an active forum for staden. Thanks, Joel On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: > For ABI files you'll need an older version of io_lib that supports ABI or > the io_lib that comes with the full staden package. Recent versions of > io_lib don't have ABI support built-in anymore. > > chris > > On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > >> Hi Mike. >> >> I did get that problem solved in the end, thanks to lots of help from >> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >> cc-ing the mailing list at some point. The last archived message in the >> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >> the correct solution - the code change was incorporated into the >> bioperl-ext CVS, and is in the latest version that you can get from SVN >> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >> problem you must be experiencing a different issue. >> >> You should also bear in mind the message Chris Fields sent to the list a >> few days ago, and have a look at using BioLib instead: >> >>> Just to note, we're not actively supporting much of the bioperl-ext >>> code, in favor of the BioLib initiative: >>> http://biolib.open-bio.org/wiki/Main_Page >>> If you do use bioperl-ext I suggest only using the latest code from svn >>> (and that in combination with bioperl-live). >> > >>> chris >> >> Hope this helps. >> Roy. >> >> >> >> Michael Stubbington wrote: >>> Dear Dr. Chaudhuri, >>> I am currently trying to write a bioperl script that parses .abi sequence >>> files. I am having exactly the same problem as you did when >>> you posted this enquiry to the bioperl mailing list >>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>> wondering if you ever solved the problem and, if so, can you remember >>> what you did? I?d be very grateful for any help you can provide. I >>> can?t find this problem mentioned anywhere else online. >>> Thank you for your time. >>> Mike >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Apr 15 08:26:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Apr 2009 07:26:15 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <20090415065037.GB1175@eniac.jgi-psf.org> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> Message-ID: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Joel, They haven't stopped supporting it. IIRC the separate io_lib distribution no longer has the ABI headers, but the io_lib with the full staden package does (a little confusing, yes). I have 1.11.6 and ABI-related tests for bioperl and bioperl-ext don't pass, but compiling with an earlier version does work. It may be as simple as including the header files from an old version, but I haven't tried that. chris On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > Hello, > Do you know where it says io_lib will stop supporting ABI? We use > the latest ( 1.11.6 ) for this so I know it does read them and I just > checked with one fresh off a sequencer. But I couldn't find an active > forum for staden. > > Thanks, > Joel > > On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >> For ABI files you'll need an older version of io_lib that supports >> ABI or >> the io_lib that comes with the full staden package. Recent >> versions of >> io_lib don't have ABI support built-in anymore. >> >> chris >> >> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >> >>> Hi Mike. >>> >>> I did get that problem solved in the end, thanks to lots of help >>> from >>> Aaron Mackey. Looking at the bioperl-l archives it seems like we >>> stopped >>> cc-ing the mailing list at some point. The last archived message >>> in the >>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/ >>> 018925.html) had >>> the correct solution - the code change was incorporated into the >>> bioperl-ext CVS, and is in the latest version that you can get >>> from SVN >>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't >>> solve the >>> problem you must be experiencing a different issue. >>> >>> You should also bear in mind the message Chris Fields sent to the >>> list a >>> few days ago, and have a look at using BioLib instead: >>> >>>> Just to note, we're not actively supporting much of the bioperl-ext >>>> code, in favor of the BioLib initiative: >>>> http://biolib.open-bio.org/wiki/Main_Page >>>> If you do use bioperl-ext I suggest only using the latest code >>>> from svn >>>> (and that in combination with bioperl-live). >>>> >>>> chris >>> >>> Hope this helps. >>> Roy. >>> >>> >>> >>> Michael Stubbington wrote: >>>> Dear Dr. Chaudhuri, >>>> I am currently trying to write a bioperl script that parses .abi >>>> sequence >>>> files. I am having exactly the same problem as you did when >>>> you posted this enquiry to the bioperl mailing list >>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>> wondering if you ever solved the problem and, if so, can you >>>> remember >>>> what you did? I?d be very grateful for any help you can provide. I >>>> can?t find this problem mentioned anywhere else online. >>>> Thank you for your time. >>>> Mike >>> >>> -- >>> Dr. Roy Chaudhuri >>> Department of Veterinary Medicine >>> University of Cambridge, U.K. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Michael.Stubbington at hpa.org.uk Wed Apr 15 03:43:39 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Wed, 15 Apr 2009 08:43:39 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <335635A922FA2B43B35B6ADD7929CC590171550C@porhpaexc001.HPA.org.uk> Thanks a lot for your help. I finally solved the problem with a combination of: 1) Checking out the latest bioperl-ext from svn. 2) A fresh install of an earlier version of io_lib (8.12) 3) Changing to "config.h" in os.h Everything seems to be working now. Best wishes, Mike -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: 14 April 2009 16:20 To: Roy Chaudhuri Cc: Michael Stubbington; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Bio::SeqIO::staden::read make test error For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I'd be very grateful for any help you can provide. I >> can't find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From cjfields1 at gmail.com Mon Apr 20 12:12:10 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Mon, 20 Apr 2009 11:12:10 -0500 Subject: [Bioperl-l] BioPerl 1.6.1 slate Message-ID: <58CCB0F1-9BC8-4437-8870-3D6CAA7BB1ED@gmail.com> All, Just to note, the bioperl 1.6.1 release will probably be delayed until mid-May (just been too busy to work on it, end-of-semester crunch and all). I'll probably release an alpha prior to that (maybe first week of May) for testing some bug fixes across platforms. cheers! chris From nagel at moldiag.de Tue Apr 21 10:31:29 2009 From: nagel at moldiag.de (Mato Nagel) Date: Tue, 21 Apr 2009 16:31:29 +0200 Subject: [Bioperl-l] Exact codon numbering Message-ID: <49EDD8C1.7000101@moldiag.de> Dear colleagues, I spend this evening browsing all your information but didn't succeed in finding a module that translates feature data (CDS and mRNA) into codon numbering. I developed a routine that from an NCBI xml-file creates a structure $exonstructure =[ splice_variant_1->[exon_1->{mRNA_from ->'1', mRNA_to->'something', cDNA_from->'something', cDNA_to->'something', CDS_from->'something', CDS_to->'something', } exon_2->{...} ... ] splice_variant_2 [... ] ] I wonder if it is worth publishing this routine in BioPerl. Looking forward to receiving an answer. Sincerely Yours Mato Nagel From dan.bolser at gmail.com Wed Apr 22 06:49:42 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 11:49:42 +0100 Subject: [Bioperl-l] Creating a fastq format file? Message-ID: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Creating a fastq format file from fasta and 'fasta quality file'? Hi, I'm sure this is easy, but I'm still not able to 'think bioperl'... I have a 'fasta quality file' and a fasta file, and I would like to output a fastq file. I followed the discussion on the previous thread here: http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html With the conclusion seeming to be 'just do it'. Could someone point me at a way to do this, or was that suggestion an error? i.e. the poster was working out a way to create a fastq the only way possible... I get the feeling that this should be a one-liner, but perhaps the above thread was demonstrating the code I need to copy. Thanks for any suggestions, Dan. From drummike at gmail.com Wed Apr 22 08:28:08 2009 From: drummike at gmail.com (Mike Williams) Date: Wed, 22 Apr 2009 08:28:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > Creating a fastq format file from fasta and 'fasta quality file'? > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? Hi there. You should take a look at the documentation for formatdb, that will get you there. http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Mike From dan.bolser at gmail.com Wed Apr 22 09:10:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:10:14 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220610m7ef63a63m8590956d32d57d17@mail.gmail.com> 2009/4/22 Mike Williams : > On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? > > > Hi there. ?You should take a look at the documentation for formatdb, that > will get you there. > > http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Really? I don't find the word fastq anywhere in that file... I know the fastq format isn't that complex, but why write my own custom conversion utility if one already exists right? Bioperl is so good at converting between other formats, I just assumed there should be a couple of lines to get this done. Cheers, Dan. -- Talk live to HOT bioperl developers in your area NOW!! irc://irc.freenode.net/#bioperl > Mike > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Wed Apr 22 09:32:15 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:32:15 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> In the Bio::SeqIO::fastq page: http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq I read: "This object can transform Bio::Seq and Bio::Seq::Quality objects to and from fastq flat file databases." I'm not sure how to code the link between the fastq IO object and the qual object that I have created using the code from the previous thread... Any suggestions? What am I missing? 2009/4/22 Dan Bolser : > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > From dan.bolser at gmail.com Wed Apr 22 09:36:03 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:36:03 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <892884AD17FA42DA96BA586AEAE2170E@NewLife> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <892884AD17FA42DA96BA586AEAE2170E@NewLife> Message-ID: <2c8757af0904220636q6ad96152p63405e03bbe85e6f@mail.gmail.com> Cheers Mark - I was having difficulty understanding that module... I should read more and post less ;-) I got it figured out now... Here is my working code, based on the example kindly posted by Phillip San Miguel #!/usr/bin/perl -w use warnings; use strict; use Bio::SeqIO; use Bio::Seq::Quality; my ($seq_infile,$qual_infile) =(scalar @ARGV == 1) ?($ARGV[0] ,"$ARGV[0].qual") :@ARGV; #Create input objects for both a seq (fasta) and qual file my $in_seq_obj = Bio::SeqIO->new( -file => $seq_infile, -format => 'fasta', ); my $in_qual_obj = Bio::SeqIO->new( -file => $qual_infile, -format => 'qual', ); my $out_fastq_obj = Bio::SeqIO->new( -format => 'fastq' ); while (1){ ## create objects for both a seq and its associated qual my $seq_obj = $in_seq_obj->next_seq || last; my $qual_obj = $in_qual_obj->next_seq; #use seq and qual object methods feed info for new BSQ object my $bsq_obj = Bio::Seq::Quality->new( -seq => $seq_obj->seq(), -qual => $qual_obj->qual(), ); $out_fastq_obj->write_fastq($bsq_obj); exit; } 2009/4/22 Mark A. Jensen : > Dan- There is a fastq module under Bio::SeqIO. Do something like > > ? ? ? ? use Bio::Seq::Quality; > ? ? ? ? use Bio::SeqIO; > ? ? ? ? ? ? ? ?# from Bio::Seq::Quality synopsis... > ? ? ? ?my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; > ? ? ? ?my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; > > ? ? ? ?my $seq = Bio::Seq::Quality->new > ? ? ? ? ? ?( -qual => $qual, > ? ? ? ? ? ? ?-trace_indices => $trace, > ? ? ? ? ? ? ?-seq => ?'atcgatcgatcg', > ? ? ? ? ? ? ?-id ?=> 'human_id', > ? ? ? ? ? ? ?-accession_number => 'S000012', > ? ? ? ? ? ? ?-verbose => -1 ? # to silence deprecated methods > ? ? ? ?); > ? ? ? # typical Bio::SeqIO call > ? ? ? $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); > ? ? ? $seqio->write_seq($seq); > > Mark > ----- Original Message ----- From: "Dan Bolser" > To: > Sent: Wednesday, April 22, 2009 6:49 AM > Subject: [Bioperl-l] Creating a fastq format file? > > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From maj at fortinbras.us Wed Apr 22 09:33:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 22 Apr 2009 09:33:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <892884AD17FA42DA96BA586AEAE2170E@NewLife> Dan- There is a fastq module under Bio::SeqIO. Do something like use Bio::Seq::Quality; use Bio::SeqIO; # from Bio::Seq::Quality synopsis... my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; my $seq = Bio::Seq::Quality->new ( -qual => $qual, -trace_indices => $trace, -seq => 'atcgatcgatcg', -id => 'human_id', -accession_number => 'S000012', -verbose => -1 # to silence deprecated methods ); # typical Bio::SeqIO call $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); $seqio->write_seq($seq); Mark ----- Original Message ----- From: "Dan Bolser" To: Sent: Wednesday, April 22, 2009 6:49 AM Subject: [Bioperl-l] Creating a fastq format file? > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mmuratet at hudsonalpha.org Wed Apr 22 10:03:57 2009 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Wed, 22 Apr 2009 09:03:57 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> Message-ID: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > In the Bio::SeqIO::fastq page: > > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq > > > I read: > > "This object can transform Bio::Seq and Bio::Seq::Quality objects to > and from fastq flat file databases." > > I'm not sure how to code the link between the fastq IO object and the > qual object that I have created using the code from the previous > thread... > > Any suggestions? What am I missing? Howdy This might be a good place to ask the question: having looked at the fastq.pm page, is the fastq format defined (only) by a "@'" followed by a sequence line and a "+" header followed by a quality line and the two headers have to agree? Now that Illumina is using phred scaling, are 'Sanger' and 'Illumina' versions the same? Thanks Mike > > > > 2009/4/22 Dan Bolser : >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point >> me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Apr 22 09:38:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 22 Apr 2009 09:38:53 -0400 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: Hi Carlos, I am moving your inquiry to the BioPerl list, as the tool is a part of Bioperl-db and uses BioPerl for parsing the ontologies. In your case, the goflat parser in BioPerl seems to balk at the second one of the input files. It may be that the input file is (was?) corrupted, that does happen every once in a while. More likely though is that the goflat parser hasn't kept up with some format changes. Have you tried using the obo format version instead? -hilmar On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > Hi guys > > I'm working with biosql and I try to figure out how to load > ontologies into biosql. > > I've tried > > load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- > host localhost --dbname biosql --namespace "Gene Ontology" --format > goflat --fmtargs "-defs_file,GO.defs" function.ontology > process.ontology component.ontology > > as in the script info but I have an error, > > > ------------------- WARNING --------------------- > MSG: DBLink exists in the dblink of _default > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: format error (file process.ontology) offending line: > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > regulation of angiogenesis ; synonym:down\-regulation of > angiogenesis ; synonym:downregulation of angiogenesis ; > synonym:inhibition of angiogenesis % negative regulation of > developmental process ; GO:0051093 % regulation of angiogenesis ; GO: > 0045765 > > STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ > perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ > Bio/OntologyIO/dagflat.pm:284 > STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ > 5.10.0/Bio/OntologyIO/dagflat.pm:317 > STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ > load_ontology.pl:604 > ------------------------------------- > > Any suggestion? > > Cheers, > > Carlos > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Apr 22 10:50:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Apr 2009 09:50:47 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: On Apr 22, 2009, at 9:03 AM, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and > the two headers have to agree? Now that Illumina is using phred > scaling, are 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike I think that's how it is defined, but I remember a while ago finding a formal definition of the format was a bit difficult. Looks like that has been rectified: http://maq.sourceforge.net/fastq.shtml If the parser doesn't read Illumina FASTQ format feel free to post a bug report with some example data. I'm sure this will be needed functionality in the future (and it shouldn't be too hard to add in). chris From hans-rudolf.hotz at fmi.ch Wed Apr 22 10:58:21 2009 From: hans-rudolf.hotz at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 22 Apr 2009 16:58:21 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and the > two headers have to agree? Now that Illumina is using phred scaling, > are 'Sanger' and 'Illumina' versions the same? No, see: http://maq.sourceforge.net/fastq.shtml Regards, Hans > > Thanks > > Mike From j_martin at lbl.gov Wed Apr 22 11:58:15 2009 From: j_martin at lbl.gov (Joel Martin) Date: Wed, 22 Apr 2009 08:58:15 -0700 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: <20090422155815.GA14402@eniac.jgi-psf.org> On Wed, Apr 22, 2009 at 09:03:57AM -0500, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed by a > sequence line and a "+" header followed by a quality line and the two > headers have to agree? Now that Illumina is using phred scaling, are > 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike No they aren't the same, Illumina still encodes the ascii as value + 64 and Sanger as value + 33. Joel From j_martin at lbl.gov Thu Apr 23 05:32:08 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 23 Apr 2009 02:32:08 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Message-ID: <20090423093208.GB22615@eniac.jgi-psf.org> Hello, Maybe they put the headers back in the separate distribution, they seem to be there now. ls -l io_lib-1.11.6/io_lib/abi.h 4 -rw-r--r-- 1 me mypeeps 793 Dec 10 06:54 io_lib-1.11.6/io_lib/abi.h And I can get the ABI-tests to pass with the bioperl-ext on linux, though it takes some odd contortions of the Makefile to get it to compile here. [snip] # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 1 wallclock secs ( 0.95 cusr + 0.06 csys = 1.01 CPU) I might find time to take a shot at getting it to compile cleanerly for linux and solaris, unless you think that's pointless as the BioLib conversion might happen before summer? Joel On Wed, Apr 15, 2009 at 07:26:15AM -0500, Chris Fields wrote: > Joel, > > They haven't stopped supporting it. IIRC the separate io_lib distribution > no longer has the ABI headers, but the io_lib with the full staden package > does (a little confusing, yes). I have 1.11.6 and ABI-related tests for > bioperl and bioperl-ext don't pass, but compiling with an earlier version > does work. It may be as simple as including the header files from an old > version, but I haven't tried that. > > chris > > On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > >> Hello, >> Do you know where it says io_lib will stop supporting ABI? We use >> the latest ( 1.11.6 ) for this so I know it does read them and I just >> checked with one fresh off a sequencer. But I couldn't find an active >> forum for staden. >> >> Thanks, >> Joel >> >> On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >>> For ABI files you'll need an older version of io_lib that supports ABI or >>> the io_lib that comes with the full staden package. Recent versions of >>> io_lib don't have ABI support built-in anymore. >>> >>> chris >>> >>> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >>> >>>> Hi Mike. >>>> >>>> I did get that problem solved in the end, thanks to lots of help from >>>> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >>>> cc-ing the mailing list at some point. The last archived message in the >>>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >>>> the correct solution - the code change was incorporated into the >>>> bioperl-ext CVS, and is in the latest version that you can get from SVN >>>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >>>> problem you must be experiencing a different issue. >>>> >>>> You should also bear in mind the message Chris Fields sent to the list a >>>> few days ago, and have a look at using BioLib instead: >>>> >>>>> Just to note, we're not actively supporting much of the bioperl-ext >>>>> code, in favor of the BioLib initiative: >>>>> http://biolib.open-bio.org/wiki/Main_Page >>>>> If you do use bioperl-ext I suggest only using the latest code from >>>>> svn >>>>> (and that in combination with bioperl-live). >>>>> >>>>> chris >>>> >>>> Hope this helps. >>>> Roy. >>>> >>>> >>>> >>>> Michael Stubbington wrote: >>>>> Dear Dr. Chaudhuri, >>>>> I am currently trying to write a bioperl script that parses .abi >>>>> sequence >>>>> files. I am having exactly the same problem as you did when >>>>> you posted this enquiry to the bioperl mailing list >>>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>>> wondering if you ever solved the problem and, if so, can you remember >>>>> what you did? I?d be very grateful for any help you can provide. I >>>>> can?t find this problem mentioned anywhere else online. >>>>> Thank you for your time. >>>>> Mike >>>> >>>> -- >>>> Dr. Roy Chaudhuri >>>> Department of Veterinary Medicine >>>> University of Cambridge, U.K. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Apr 23 11:45:34 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Apr 2009 08:45:34 -0700 Subject: [Bioperl-l] Request concerning BioPerl In-Reply-To: <49F0300C.2060700@moldiag.de> References: <49F0300C.2060700@moldiag.de> Message-ID: Mato- Please ask on the mailing list - there is documention in the perldoc for starters and the rest depends on how you are querying for accessions or using Entrez queries. -jason On Apr 23, 2009, at 2:08 AM, Mato Nagel wrote: > Dear colleagues, > where are the options documented? > > $gb = Bio::DB::GenBank->new(@options) > > Sincerely Yours > Mato Nagel Jason Stajich jason at bioperl.org From dan.bolser at gmail.com Fri Apr 24 11:24:17 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 16:24:17 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? Message-ID: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Hi all, I couldn't find out how to get the 'clear range' from a Bio::Seq::Quality object... Am I looking in the wrong place, or should this method be a part of the Bio::Seq::Quality class? In the latter case I'm on my way to an implementation, but I am not good at navigating the bioperl docs, so I thought I should ask before I take the time to finish that off. Cheers, Dan. From dan.bolser at gmail.com Fri Apr 24 12:20:23 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 17:20:23 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Message-ID: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Its a bit rough and ready, but it does what I need... =head2 get_clear_range Title : get_clear_range Title : subqual Usage : $subobj = $obj->get_clear_range(); $subobj = $obj->get_clear_range(20); Function : Get the clear range using the given quality score as a cutoff or a default value of 13. Returns : a new Bio::Seq::Quality object Args : a minimum quality value, optional, devault = 13 =cut sub get_clear_range { my $self = shift; my $qual = $self->qual; my $minQual = shift || 13; my (@ranges, $rangeFlag); for(my $i=0; $i<@$qual; $i++){ ## Are we currently within a clear range or not? if(defined($rangeFlag)){ ## Did we just leave the clear range? if($qual->[$i]<$minQual){ ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; ## and reset the range flag. $rangeFlag = undef; } ## else nothing changes } else{ ## Did we just enter a clear range? if($qual->[$i]>=$minQual){ ## Better set the range flag! $rangeFlag = $i; } ## else nothing changes } } ## Did we exit the last clear range? if(defined($rangeFlag)){ my $i = scalar(@$qual); ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; } unless(@ranges){ die "There is no clear range... I don't know what to do here!\n"; } print "there are ", scalar(@ranges), " clear ranges\n"; my $sum; map {$sum += $_->[2]} @ranges; print "of ", scalar(@$qual), " bases, there are $sum with ". "quality scores above the given threshold\n"; for (sort {$b->[2] <=> $a->[2]} @ranges){ if($_->[2]/$sum < 0.5){ warn "not so much a clear range as a clear chunk...\n"; } print $_->[2], "\t", $_->[2]/$sum, "\n"; return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, $_->[1]+1), -qual => $self->subqual($_->[0]+1, $_->[1]+1) ); } } Note, for testing I made a package called Bio/Seq/QualityDB.pm (which is a copy of Bio/Seq/Quality.pm that just has the above method added). That is why the 'new Bio::Seq::Quality object' is actually a Bio::Seq::QualityDB object, but other than that it should slot right in (apart from all the debugging output that I spit out). Cheers, Dan. 2009/4/24 Dan Bolser : > Hi all, > > I couldn't find out how to get the 'clear range' from a > Bio::Seq::Quality object... Am I looking in the wrong place, or should > this method be a part of the Bio::Seq::Quality class? > > In the latter case I'm on my way to an implementation, but I am not > good at navigating the bioperl docs, so I thought I should ask before > I take the time to finish that off. > > > Cheers, > Dan. > From cjfields at illinois.edu Fri Apr 24 14:56:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 13:56:34 -0500 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Message-ID: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If possible, tests don't hurt either! chris On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > Its a bit rough and ready, but it does what I need... > > > > > =head2 get_clear_range > > Title : get_clear_range > > Title : subqual > Usage : $subobj = $obj->get_clear_range(); > $subobj = $obj->get_clear_range(20); > Function : Get the clear range using the given quality score as a > cutoff or a default value of 13. > > Returns : a new Bio::Seq::Quality object > Args : a minimum quality value, optional, devault = 13 > > =cut > > sub get_clear_range > { > my $self = shift; > my $qual = $self->qual; > my $minQual = shift || 13; > > my (@ranges, $rangeFlag); > > for(my $i=0; $i<@$qual; $i++){ > ## Are we currently within a clear range or not? > if(defined($rangeFlag)){ > ## Did we just leave the clear range? > if($qual->[$i]<$minQual){ > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > ## and reset the range flag. > $rangeFlag = undef; > } > ## else nothing changes > } > else{ > ## Did we just enter a clear range? > if($qual->[$i]>=$minQual){ > ## Better set the range flag! > $rangeFlag = $i; > } > ## else nothing changes > } > } > ## Did we exit the last clear range? > if(defined($rangeFlag)){ > my $i = scalar(@$qual); > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > } > > unless(@ranges){ > die "There is no clear range... I don't know what to do here!\n"; > } > > print "there are ", scalar(@ranges), " clear ranges\n"; > > my $sum; map {$sum += $_->[2]} @ranges; > > print "of ", scalar(@$qual), " bases, there are $sum with ". > "quality scores above the given threshold\n"; > > for (sort {$b->[2] <=> $a->[2]} @ranges){ > if($_->[2]/$sum < 0.5){ > warn "not so much a clear range as a clear chunk...\n"; > } > print $_->[2], "\t", $_->[2]/$sum, "\n"; > > return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, > $_->[1]+1), > -qual => $self->subqual($_->[0]+1, $_->[1]+1) > ); > } > } > > > > > Note, for testing I made a package called Bio/Seq/QualityDB.pm (which > is a copy of Bio/Seq/Quality.pm that just has the above method added). > That is why the 'new Bio::Seq::Quality object' is actually a > Bio::Seq::QualityDB object, but other than that it should slot right > in (apart from all the debugging output that I spit out). > > > Cheers, > Dan. > > > 2009/4/24 Dan Bolser : >> Hi all, >> >> I couldn't find out how to get the 'clear range' from a >> Bio::Seq::Quality object... Am I looking in the wrong place, or >> should >> this method be a part of the Bio::Seq::Quality class? >> >> In the latter case I'm on my way to an implementation, but I am not >> good at navigating the bioperl docs, so I thought I should ask before >> I take the time to finish that off. >> >> >> Cheers, >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri Apr 24 15:39:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 24 Apr 2009 12:39:53 -0700 Subject: [Bioperl-l] cvs server still up? Message-ID: <49F21589.6060707@cornell.edu> The old bioperl CVS repository is still up: cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl export -rHEAD bioperl-live I had an old script that was cvs exporting a copy of bioperl, and it has been fetching really old copies for a while now. Maybe somebody might want to deactivate that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Apr 24 16:29:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 15:29:22 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F21589.6060707@cornell.edu> References: <49F21589.6060707@cornell.edu> Message-ID: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Not sure what the plans were for the CVS server beyond having it available for all older bioperl releases (pre-1.6). Everything has been moved into the svn server though, so really the cvs server is redundant. Shutting it down might serve the purpose of alerting users to the fact that we no longer use it! Thinking some more about it: it might be present simply b/c other open- bio projects are still using cvs. I can't recall if biopython switched over or not... chris On Apr 24, 2009, at 2:39 PM, Robert Buels wrote: > The old bioperl CVS repository is still up: > cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl > export -rHEAD bioperl-live > > I had an old script that was cvs exporting a copy of bioperl, and it > has been fetching really old copies for a while now. > > Maybe somebody might want to deactivate that? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Fri Apr 24 17:03:27 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 24 Apr 2009 16:03:27 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Message-ID: <49F2291F.7020704@jays.net> Chris Fields wrote: > I can't recall if biopython switched over or not... http://github.com/biopython "Official git mirror of the Biopython CVS repository" Ponder, j From cjfields at illinois.edu Fri Apr 24 18:50:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 17:50:12 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F2291F.7020704@jays.net> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> <49F2291F.7020704@jays.net> Message-ID: <9AC3AF4D-E9FF-4593-A53A-B59438EC2BA4@illinois.edu> Which makes me wonder, is the CVS version actually updated with git commits (and vice versa) or is git the only thing being used? It is listed as a 'mirror', so I'm assuming they somehow sync to/from CVS (ugh). chris On Apr 24, 2009, at 4:03 PM, Jay Hannah wrote: > Chris Fields wrote: >> I can't recall if biopython switched over or not... > > http://github.com/biopython > "Official git mirror of the Biopython CVS repository" > > Ponder, > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Sun Apr 26 01:50:14 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 26 Apr 2009 15:50:14 +1000 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <20090422155815.GA14402@eniac.jgi-psf.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > > This might be a good place to ask the question: having looked at the > > fastq.pm page, is the fastq format defined (only) by a "@'" followed by > a > > sequence line and a "+" header followed by a quality line and the two > > headers have to agree? Now that Illumina is using phred scaling, are > > 'Sanger' and 'Illumina' versions the same? > > No they aren't the same, Illumina still encodes the ascii as value + 64 > and Sanger as value + 33. > Illumina have now CHANGED how they calculate the quality value however in the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, but now they produce Q >= 0 and it is unclear if they start at 69 or 64 now... I have tried to summarise this in a central place: http://en.wikipedia.org/wiki/FASTQ_format Corrections welcome! --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From heikki.lehvaslaiho at gmail.com Mon Apr 27 01:42:03 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 07:42:03 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > I have tried to summarise this in a central place: > http://en.wikipedia.org/wiki/FASTQ_format Torsten, Thanks for putting this together. Very helpful. Do you have a plan of action? Let me propose one for BioPerl. It based on following assumptions: 1. There is multitude of different ways of coding quality values out there. 2. Bio::Seq::Quality is agnostic of any quality value range rules 3. The emerging open standard is the Sanger fastq specification 4. Open source programs use the Sanger fastq specs >From these it follows that: 1. BioPerl should support Sanger fastq standard 1.1. it already does and there are other SeqIO modules for dealing with other non-fastq formats. 2. BioPerl should offer simple ways of converting between quality range rules 2.1. Have a generic method accessible from Bio::Seq::Quality with preset versions of the method for converting between known variants (Sanger fastq and the two Illumina versions) For example: range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) throw if $value < $from_lower or $value > $from_upper return $newvalue range_convert_illumina2fastq(), range_convert_fastq2illumina(), range_convert_fastq2phred(), range_convert_phred2fastq().... (assuming that illumina 1.3 eq phred) 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina qualities into Sanger fastq on the fly 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of quality value range either automatically or be given a keyword parameter indicating the range. 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects a quality value out of range. 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it detects a quality value out of range. 2.2.4. It would be useful but not absolutely necessary for Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina ranges What do you think? -Heikki 2009/4/26 Torsten Seemann : >> > This might be a good place to ask the question: having looked at the >> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >> a >> > sequence line and a "+" header followed by a quality line and the two >> > headers have to agree? Now that Illumina is using phred scaling, are >> > 'Sanger' and 'Illumina' versions the same? >> >> No they aren't the same, Illumina still encodes the ascii as value + 64 >> and Sanger as value + 33. >> > > Illumina have now CHANGED how they calculate the quality value however in > the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, > but now they produce Q >= 0 and it is unclear if they start at 69 or 64 > now... > > I have tried to summarise this in a central place: > > http://en.wikipedia.org/wiki/FASTQ_format > > Corrections welcome! > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 02:42:08 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 08:42:08 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: Dan, It looks like your method does two different things: 1. Returns the longest subsequence above the threshold 2. Analyses the the sequence for the number of ranges the current threshold creates. Why not separate these functions? Lets add a method that sets the threshold and stores it internally as $self->_threshold. Setting it to a new values should trigger emptying all the caches (see below.) Lets have two more public methods: 1. get_clean_range() - optional argument 'threshold' It returns the longest clean subseq. 2. count_clean_ranges() -again optional argument 'threshold' This returns the number of ranges detected. Both methods call first the public method threshold if the argument has been given and then an internal method _find_clean_ranges(). That method calculates all the ranges and stores them internally (as $self->_clean_ranges-> [...]). The number of ranges is also stored (e.g. $self->_number_of ranges).These internal values form the cache that needs to be emptied whenever any of the critical values of the object changes: threshold, quality or seq. Create an internal method $self->_clear_cache, that does that. Now the quality new object does not get created until you call get_clean_range() which accesses the cached values (or creates them if they are not there). This design allows you to have no extra penalty for adding more methods that act on cached values. For example, it might be sensible thing to do at some point to look at all the ranges that are longer than some length. Then you could write in your program: $qual->threshold(10); if ($qual->count_clean_ranges = 1) { my $newqual = $qual->get_clean_range() # do your analysis } elsif ($qual->count_clean_ranges = 0) { # do some reporting and logging } else { # more than one ranges my @quals = $qual->get_all_clean_ranges($min_lenght); # do some more work and possibly select the best one(s) } Yours, -Heikki 2009/4/24 Chris Fields : > You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If > possible, tests don't hurt either! > > chris > > On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > >> Its a bit rough and ready, but it does what I need... >> >> >> >> >> =head2 get_clear_range >> >> Title ? ?: get_clear_range >> >> Title ? ?: subqual >> Usage ? ?: $subobj = $obj->get_clear_range(); >> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >> Function : Get the clear range using the given quality score as a >> ? ? ? ? ? cutoff or a default value of 13. >> >> Returns ?: a new Bio::Seq::Quality object >> Args ? ? : a minimum quality value, optional, devault = 13 >> >> =cut >> >> sub get_clear_range >> { >> ? my $self = shift; >> ? my $qual = $self->qual; >> ? my $minQual = shift || 13; >> >> ? my (@ranges, $rangeFlag); >> >> ? for(my $i=0; $i<@$qual; $i++){ >> ? ? ? ?## Are we currently within a clear range or not? >> ? ? ? ?if(defined($rangeFlag)){ >> ? ? ? ? ? ?## Did we just leave the clear range? >> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >> ? ? ? ? ? ? ? ?## Log the range >> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? ? ? ? ? ? ? ?## and reset the range flag. >> ? ? ? ? ? ? ? ?$rangeFlag = undef; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? ? ? ?else{ >> ? ? ? ? ? ?## Did we just enter a clear range? >> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >> ? ? ? ? ? ? ? ?## Better set the range flag! >> ? ? ? ? ? ? ? ?$rangeFlag = $i; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? } >> ? ## Did we exit the last clear range? >> ? if(defined($rangeFlag)){ >> ? ? ? ?my $i = scalar(@$qual); >> ? ? ? ?## Log the range >> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? } >> >> ? unless(@ranges){ >> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >> ? } >> >> ? print "there are ", scalar(@ranges), " clear ranges\n"; >> >> ? my $sum; map {$sum += $_->[2]} @ranges; >> >> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >> ? ? ? ?"quality scores above the given threshold\n"; >> >> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >> ? ? ? ?if($_->[2]/$sum < 0.5){ >> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >> ? ? ? ?} >> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >> >> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >> $_->[1]+1), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >> $_->[1]+1) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >> ? } >> } >> >> >> >> >> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >> is a copy of Bio/Seq/Quality.pm that just has the above method added). >> That is why the 'new Bio::Seq::Quality object' is actually a >> Bio::Seq::QualityDB object, but other than that it should slot right >> in (apart from all the debugging output that I spit out). >> >> >> Cheers, >> Dan. >> >> >> 2009/4/24 Dan Bolser : >>> >>> Hi all, >>> >>> I couldn't find out how to get the 'clear range' from a >>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>> this method be a part of the Bio::Seq::Quality class? >>> >>> In the latter case I'm on my way to an implementation, but I am not >>> good at navigating the bioperl docs, so I thought I should ask before >>> I take the time to finish that off. >>> >>> >>> Cheers, >>> Dan. >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From dan.bolser at gmail.com Mon Apr 27 04:31:39 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 27 Apr 2009 09:31:39 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method ?_find_clean_ranges(). That > method calculates all the ranges and stores them internally ?(as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form ?the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do ?at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > ?my $newqual = $qual->get_clean_range() > ?# do your analysis > } elsif ($qual->count_clean_ranges = 0) { > ? # do some reporting and logging > } else { ?# more than one ranges > ? my @quals = $qual->get_all_clean_ranges($min_lenght); > ? # do some more work and possibly select the best one(s) > } > > > > Yours, > > ? -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title ? ?: get_clear_range >>> >>> Title ? ?: subqual >>> Usage ? ?: $subobj = $obj->get_clear_range(); >>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> ? ? ? ? ? cutoff or a default value of 13. >>> >>> Returns ?: a new Bio::Seq::Quality object >>> Args ? ? : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> ? my $self = shift; >>> ? my $qual = $self->qual; >>> ? my $minQual = shift || 13; >>> >>> ? my (@ranges, $rangeFlag); >>> >>> ? for(my $i=0; $i<@$qual; $i++){ >>> ? ? ? ?## Are we currently within a clear range or not? >>> ? ? ? ?if(defined($rangeFlag)){ >>> ? ? ? ? ? ?## Did we just leave the clear range? >>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>> ? ? ? ? ? ? ? ?## Log the range >>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? ? ? ? ? ? ? ?## and reset the range flag. >>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? ? ? ?else{ >>> ? ? ? ? ? ?## Did we just enter a clear range? >>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>> ? ? ? ? ? ? ? ?## Better set the range flag! >>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? } >>> ? ## Did we exit the last clear range? >>> ? if(defined($rangeFlag)){ >>> ? ? ? ?my $i = scalar(@$qual); >>> ? ? ? ?## Log the range >>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? } >>> >>> ? unless(@ranges){ >>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>> ? } >>> >>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> ? my $sum; map {$sum += $_->[2]} @ranges; >>> >>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>> ? ? ? ?"quality scores above the given threshold\n"; >>> >>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>> ? ? ? ?} >>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>> $_->[1]+1), >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>> ? } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:38:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:38:40 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: I convinced at least myself to the degree that I wrote the range_convert() method - with plenty of tests. I mention this now so that no-one else need to start thinking through all the edge values. :) I'll contribute it to the code base once there is a consensus of best way forward. -Heikki 2009/4/27 Heikki Lehvaslaiho : >> I have tried to summarise this in a central place: >> http://en.wikipedia.org/wiki/FASTQ_format > > Torsten, > > Thanks for putting this together. Very helpful. > > Do you have a plan of action? ?Let me propose one for BioPerl. It > based on following assumptions: > > 1. There is multitude of different ways of coding quality values out there. > 2. Bio::Seq::Quality is agnostic of any quality value range rules > 3. The emerging open standard is the Sanger fastq specification > 4. Open source programs use the Sanger fastq specs > > > From these it follows that: > > > 1. BioPerl should support Sanger fastq standard > > 1.1. it already does and there are other SeqIO modules for dealing > with other non-fastq formats. > > 2. BioPerl should offer simple ways of converting between quality range rules > > 2.1. Have a generic method accessible from Bio::Seq::Quality with > preset versions of the method for converting between known variants > (Sanger fastq and the two Illumina versions) > > For example: > > range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) > ?throw if $value < $from_lower or $value > $from_upper > ?return $newvalue > > range_convert_illumina2fastq(), range_convert_fastq2illumina(), > range_convert_fastq2phred(), ?range_convert_phred2fastq().... > > (assuming that illumina 1.3 eq phred) > > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina > qualities into Sanger fastq on the fly > > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of > quality value range either automatically or be given a keyword > parameter indicating the range. > > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects > a quality value out of range. > > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it > detects a quality value out of range. > > 2.2.4. It would be useful but not absolutely necessary for > Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina > ranges > > > What do you think? > > ? ?-Heikki > > 2009/4/26 Torsten Seemann : >>> > This might be a good place to ask the question: having looked at the >>> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >>> a >>> > sequence line and a "+" header followed by a quality line and the two >>> > headers have to agree? Now that Illumina is using phred scaling, are >>> > 'Sanger' and 'Illumina' versions the same? >>> >>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>> and Sanger as value + 33. >>> >> >> Illumina have now CHANGED how they calculate the quality value however in >> the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, >> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >> now... >> >> I have tried to summarise this in a central place: >> >> http://en.wikipedia.org/wiki/FASTQ_format >> >> Corrections welcome! >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:41:52 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:41:52 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, I'll take your code and put it into bioperl-live rewritten the way I suggested and add few tests. That should get you started, -Heikki 2009/4/27 Dan Bolser : > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method ?_find_clean_ranges(). That >> method calculates all the ranges and stores them internally ?(as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form ?the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do ?at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> ?my $newqual = $qual->get_clean_range() >> ?# do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> ? # do some reporting and logging >> } else { ?# more than one ranges >> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >> ? # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> ? -Heikki >> >> 2009/4/24 Chris Fields : >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title ? ?: get_clear_range >>>> >>>> Title ? ?: subqual >>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> ? ? ? ? ? cutoff or a default value of 13. >>>> >>>> Returns ?: a new Bio::Seq::Quality object >>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> ? my $self = shift; >>>> ? my $qual = $self->qual; >>>> ? my $minQual = shift || 13; >>>> >>>> ? my (@ranges, $rangeFlag); >>>> >>>> ? for(my $i=0; $i<@$qual; $i++){ >>>> ? ? ? ?## Are we currently within a clear range or not? >>>> ? ? ? ?if(defined($rangeFlag)){ >>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>> ? ? ? ? ? ? ? ?## Log the range >>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? ? ? ?else{ >>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? } >>>> ? ## Did we exit the last clear range? >>>> ? if(defined($rangeFlag)){ >>>> ? ? ? ?my $i = scalar(@$qual); >>>> ? ? ? ?## Log the range >>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? } >>>> >>>> ? unless(@ranges){ >>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>> ? } >>>> >>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> ? ? ? ?"quality scores above the given threshold\n"; >>>> >>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>> ? ? ? ?} >>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>> $_->[1]+1), >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>> ? } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 09:10:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 08:10:04 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: This is going within Bio::Seq::Quality, correct? Does Bio::Seq::Quality have a method that indicates what format the quality scores are actually in (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about this is quality scores appear inseparable from their quality format (ranges vary in length, for instance). For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell which quality format it originated from w/o guessing, and similarly could I accurately convert it to another qual format? To me it seems we need something in Bio::Seq::Quality akin to the alphabet() method used for sequence data. chris On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > I convinced at least myself to the degree that I wrote the > range_convert() method - with plenty of tests. I mention this now so > that no-one else need to start thinking through all the edge values. > :) > > I'll contribute it to the code base once there is a consensus of best > way forward. > > -Heikki > > 2009/4/27 Heikki Lehvaslaiho : >>> I have tried to summarise this in a central place: >>> http://en.wikipedia.org/wiki/FASTQ_format >> >> Torsten, >> >> Thanks for putting this together. Very helpful. >> >> Do you have a plan of action? Let me propose one for BioPerl. It >> based on following assumptions: >> >> 1. There is multitude of different ways of coding quality values >> out there. >> 2. Bio::Seq::Quality is agnostic of any quality value range rules >> 3. The emerging open standard is the Sanger fastq specification >> 4. Open source programs use the Sanger fastq specs >> >> >> From these it follows that: >> >> >> 1. BioPerl should support Sanger fastq standard >> >> 1.1. it already does and there are other SeqIO modules for dealing >> with other non-fastq formats. >> >> 2. BioPerl should offer simple ways of converting between quality >> range rules >> >> 2.1. Have a generic method accessible from Bio::Seq::Quality with >> preset versions of the method for converting between known variants >> (Sanger fastq and the two Illumina versions) >> >> For example: >> >> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, >> $value) >> throw if $value < $from_lower or $value > $from_upper >> return $newvalue >> >> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >> range_convert_fastq2phred(), range_convert_phred2fastq().... >> >> (assuming that illumina 1.3 eq phred) >> >> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >> qualities into Sanger fastq on the fly >> >> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream >> of >> quality value range either automatically or be given a keyword >> parameter indicating the range. >> >> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it >> detects >> a quality value out of range. >> >> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >> detects a quality value out of range. >> >> 2.2.4. It would be useful but not absolutely necessary for >> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >> ranges >> >> >> What do you think? >> >> -Heikki >> >> 2009/4/26 Torsten Seemann : >>>>> This might be a good place to ask the question: having looked at >>>>> the >>>>> fastq.pm page, is the fastq format defined (only) by a "@'" >>>>> followed by >>>> a >>>>> sequence line and a "+" header followed by a quality line and >>>>> the two >>>>> headers have to agree? Now that Illumina is using phred scaling, >>>>> are >>>>> 'Sanger' and 'Illumina' versions the same? >>>> >>>> No they aren't the same, Illumina still encodes the ascii as >>>> value + 64 >>>> and Sanger as value + 33. >>>> >>> >>> Illumina have now CHANGED how they calculate the quality value >>> however in >>> the last month or so... Their Q range used to be -5..40 mapped to >>> ASCII 64+, >>> but now they produce Q >= 0 and it is unclear if they start at 69 >>> or 64 >>> now... >>> >>> I have tried to summarise this in a central place: >>> >>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Corrections welcome! >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From markus.liebscher at gmx.de Mon Apr 27 09:51:09 2009 From: markus.liebscher at gmx.de (manni122) Date: Mon, 27 Apr 2009 06:51:09 -0700 (PDT) Subject: [Bioperl-l] Re moteblast using Swissprot Message-ID: <23256705.post@talk.nabble.com> Hi, I want to retrieve the sequence identifier from the remoteblast interface (Bio::Tools::Run::RemoteBlast). With this ID I want to look up annotations stored in the Bio::DB::SwissProt. I am using the example code from the RemoteBlast documentation. If I am using a known sequence as input I get "Can't call method "next_hsp" on an undefined value "? This happens only with swissprot as database - the nr database works fine. The accession code from nr is not accepted from the Bio::DB::SwissProt. Is there something wrong with the database? Here is the code I am using: my $v = 1; my @params = ('-prog' => 'blastp', '-data' => 'nr', '-expect' => '1e-10' ); #swissprot is not working $Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM62'; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); $v = 1; my $r = $factory->submit_blast($proteinaa); print STDERR "Need BLAST Analysis, waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { $factory->remove_rid($rid); $result = $rc->next_result; $hit = $result->next_hit; $hsp = $hit->next_hsp; $idneu = $hit->accession; } } } -- View this message in context: http://www.nabble.com/Remoteblast-using-Swissprot-tp23256705p23256705.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From heikki.lehvaslaiho at gmail.com Mon Apr 27 11:44:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:44:40 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. Test and extend, -Heikki 2009/4/27 Heikki Lehvaslaiho : > Dan, > > I'll take your code and put it into bioperl-live rewritten the way I > suggested and add few tests. > > That should get you started, > > ? -Heikki > > 2009/4/27 Dan Bolser : >> Hi Heikki, >> >> Thanks very much for the advice on how to better implement the clear >> range method within the Bio::Seq::Quality object. I can understand the >> logic of what you have written, and it all sounds reasonable. The only >> problem is that I am very inexperienced with working on object >> oriented Perl (my 'one man' projects to date have never really >> required me to think beyond scripts, and its been years since I >> actually tried to code objects in Perl). >> >> To be specific, when you say, "Lets add a method that sets the >> threshold and stores it internally as $self->_threshold", ignoring any >> other functionality, what would that method look like? in particular, >> how would $self->_threshold be implemented? >> >> I think once I see that detail, I can go ahead and try to code what >> you suggested. >> >> >> Similarly (Chris), where would I put the tests / how would they be implemented? >> >> >> Thanks again for the feedback. >> >> All the best, >> Dan. >> >> >> >> 2009/4/27 Heikki Lehvaslaiho : >>> Dan, >>> >>> It looks like your method does two different things: >>> >>> 1. Returns the longest subsequence above the threshold >>> 2. Analyses the the sequence for the number of ranges the current >>> threshold creates. >>> >>> Why not separate these functions? >>> >>> Lets add a method that sets the threshold and stores it internally as >>> $self->_threshold. Setting it to a new values should trigger emptying >>> all the caches (see below.) >>> >>> Lets have two more public methods: >>> >>> 1. get_clean_range() - optional argument 'threshold' >>> >>> It returns the longest clean subseq. >>> >>> 2. count_clean_ranges() -again optional argument 'threshold' >>> >>> This returns the number of ranges detected. >>> >>> Both methods call first the public method threshold if the argument >>> has been given and then an internal method ?_find_clean_ranges(). That >>> method calculates all the ranges and stores them internally ?(as >>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>> that needs to be emptied whenever any of the critical values of the >>> object changes: threshold, quality or seq. Create an internal method >>> $self->_clear_cache, that does that. >>> >>> Now the quality new object does not get created until you call >>> get_clean_range() which accesses the cached values (or creates them if >>> they are not there). >>> >>> This design allows you to have no extra penalty for adding more >>> methods that act on cached values. For example, it might be sensible >>> thing to do ?at some point to look at all the ranges that are longer >>> than some length. Then you could write in your program: >>> >>> >>> $qual->threshold(10); >>> if ($qual->count_clean_ranges = 1) { >>> ?my $newqual = $qual->get_clean_range() >>> ?# do your analysis >>> } elsif ($qual->count_clean_ranges = 0) { >>> ? # do some reporting and logging >>> } else { ?# more than one ranges >>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>> ? # do some more work and possibly select the best one(s) >>> } >>> >>> >>> >>> Yours, >>> >>> ? -Heikki >>> >>> 2009/4/24 Chris Fields : >>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>> possible, tests don't hurt either! >>>> >>>> chris >>>> >>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>> >>>>> Its a bit rough and ready, but it does what I need... >>>>> >>>>> >>>>> >>>>> >>>>> =head2 get_clear_range >>>>> >>>>> Title ? ?: get_clear_range >>>>> >>>>> Title ? ?: subqual >>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>> Function : Get the clear range using the given quality score as a >>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>> >>>>> Returns ?: a new Bio::Seq::Quality object >>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>> >>>>> =cut >>>>> >>>>> sub get_clear_range >>>>> { >>>>> ? my $self = shift; >>>>> ? my $qual = $self->qual; >>>>> ? my $minQual = shift || 13; >>>>> >>>>> ? my (@ranges, $rangeFlag); >>>>> >>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>> ? ? ? ? ? ? ? ?## Log the range >>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? ? ? ?else{ >>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? } >>>>> ? ## Did we exit the last clear range? >>>>> ? if(defined($rangeFlag)){ >>>>> ? ? ? ?my $i = scalar(@$qual); >>>>> ? ? ? ?## Log the range >>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? } >>>>> >>>>> ? unless(@ranges){ >>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>> ? } >>>>> >>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>> >>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>> >>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>> >>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>> ? ? ? ?} >>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>> >>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>> $_->[1]+1), >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>> $_->[1]+1) >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>> ? } >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>> in (apart from all the debugging output that I spit out). >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>>> >>>>> 2009/4/24 Dan Bolser : >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I couldn't find out how to get the 'clear range' from a >>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>> >>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>> I take the time to finish that off. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? ?-Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 11:53:12 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:53:12 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: 2009/4/27 Chris Fields : > This is going within Bio::Seq::Quality, correct? Yes. Does Bio::Seq::Quality > have a method that indicates what format the quality scores are actually in > (sanger/illumina/illumina1.3/phred/foo)? ?The reason I worry about this is > quality scores appear inseparable from their quality format (ranges vary in > length, for instance). No method. > For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell > which quality format it originated from w/o guessing, and similarly could I > accurately convert it to another qual format? ?To me it seems we need > something in Bio::Seq::Quality akin to the alphabet() method used for > sequence data. The text formats encode the quality values in different ways, but they are all stored as integer arrays in the object. Converting between them is relatively easy. You are right: quality_format() or even plain format() is needed. The SeqIO methods creating the objects should be setting it. Warnings for unset format values should be added to appropriate places. -Heikki > chris > > On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > >> I convinced at least myself to the degree that I wrote the >> range_convert() method - with plenty of tests. I mention this now so >> that no-one else need to start thinking through all the edge values. >> :) >> >> I'll contribute it to the code base once there is a consensus of best >> way forward. >> >> ? -Heikki >> >> 2009/4/27 Heikki Lehvaslaiho : >>>> >>>> I have tried to summarise this in a central place: >>>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Torsten, >>> >>> Thanks for putting this together. Very helpful. >>> >>> Do you have a plan of action? ?Let me propose one for BioPerl. It >>> based on following assumptions: >>> >>> 1. There is multitude of different ways of coding quality values out >>> there. >>> 2. Bio::Seq::Quality is agnostic of any quality value range rules >>> 3. The emerging open standard is the Sanger fastq specification >>> 4. Open source programs use the Sanger fastq specs >>> >>> >>> From these it follows that: >>> >>> >>> 1. BioPerl should support Sanger fastq standard >>> >>> 1.1. it already does and there are other SeqIO modules for dealing >>> with other non-fastq formats. >>> >>> 2. BioPerl should offer simple ways of converting between quality range >>> rules >>> >>> 2.1. Have a generic method accessible from Bio::Seq::Quality with >>> preset versions of the method for converting between known variants >>> (Sanger fastq and the two Illumina versions) >>> >>> For example: >>> >>> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) >>> ?throw if $value < $from_lower or $value > $from_upper >>> ?return $newvalue >>> >>> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >>> range_convert_fastq2phred(), ?range_convert_phred2fastq().... >>> >>> (assuming that illumina 1.3 eq phred) >>> >>> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >>> qualities into Sanger fastq on the fly >>> >>> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of >>> quality value range either automatically or be given a keyword >>> parameter indicating the range. >>> >>> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects >>> a quality value out of range. >>> >>> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >>> detects a quality value out of range. >>> >>> 2.2.4. It would be useful but not absolutely necessary for >>> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >>> ranges >>> >>> >>> What do you think? >>> >>> ? -Heikki >>> >>> 2009/4/26 Torsten Seemann : >>>>>> >>>>>> This might be a good place to ask the question: having looked at the >>>>>> fastq.pm page, is the fastq format defined (only) by a "@'" followed >>>>>> by >>>>> >>>>> a >>>>>> >>>>>> sequence line and a "+" header followed by a quality line and the two >>>>>> headers have to agree? Now that Illumina is using phred scaling, are >>>>>> 'Sanger' and 'Illumina' versions the same? >>>>> >>>>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>>>> and Sanger as value + 33. >>>>> >>>> >>>> Illumina have now CHANGED how they calculate the quality value however >>>> in >>>> the last month or so... Their Q range used to be -5..40 mapped to ASCII >>>> 64+, >>>> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >>>> now... >>>> >>>> I have tried to summarise this in a central place: >>>> >>>> http://en.wikipedia.org/wiki/FASTQ_format >>>> >>>> Corrections welcome! >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> >> >> >> -- >> ? -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 12:11:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 11:11:12 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: On Apr 27, 2009, at 10:53 AM, Heikki Lehvaslaiho wrote: > 2009/4/27 Chris Fields : >> This is going within Bio::Seq::Quality, correct? > > Yes. > > Does Bio::Seq::Quality >> have a method that indicates what format the quality scores are >> actually in >> (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about >> this is >> quality scores appear inseparable from their quality format (ranges >> vary in >> length, for instance). > > No method. > >> For instance, if I picked a Bio::Seq::Quality out of the blue, >> could I tell >> which quality format it originated from w/o guessing, and similarly >> could I >> accurately convert it to another qual format? To me it seems we need >> something in Bio::Seq::Quality akin to the alphabet() method used for >> sequence data. > > The text formats encode the quality values in different ways, but they > are all stored as integer arrays in the object. Converting between > them is relatively easy. > > You are right: quality_format() or even plain format() is needed. The > SeqIO methods creating the objects should be setting it. Warnings for > unset format values should be added to appropriate places. > > -Heikki Agreed, and any conversion methods could default to using a set quality_format()/format() for conversions to/from ascii (might serve as a good verification point as well). chris From maj at fortinbras.us Mon Apr 27 11:51:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Apr 2009 11:51:39 -0400 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com><2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com><90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan - congrats on your first contribution! Mark ----- Original Message ----- From: "Dan Bolser" To: "Heikki Lehvaslaiho" Cc: "Chris Fields" ; Sent: Monday, April 27, 2009 4:31 AM Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method _find_clean_ranges(). That > method calculates all the ranges and stores them internally (as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > my $newqual = $qual->get_clean_range() > # do your analysis > } elsif ($qual->count_clean_ranges = 0) { > # do some reporting and logging > } else { # more than one ranges > my @quals = $qual->get_all_clean_ranges($min_lenght); > # do some more work and possibly select the best one(s) > } > > > > Yours, > > -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title : get_clear_range >>> >>> Title : subqual >>> Usage : $subobj = $obj->get_clear_range(); >>> $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> cutoff or a default value of 13. >>> >>> Returns : a new Bio::Seq::Quality object >>> Args : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> my $self = shift; >>> my $qual = $self->qual; >>> my $minQual = shift || 13; >>> >>> my (@ranges, $rangeFlag); >>> >>> for(my $i=0; $i<@$qual; $i++){ >>> ## Are we currently within a clear range or not? >>> if(defined($rangeFlag)){ >>> ## Did we just leave the clear range? >>> if($qual->[$i]<$minQual){ >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ## and reset the range flag. >>> $rangeFlag = undef; >>> } >>> ## else nothing changes >>> } >>> else{ >>> ## Did we just enter a clear range? >>> if($qual->[$i]>=$minQual){ >>> ## Better set the range flag! >>> $rangeFlag = $i; >>> } >>> ## else nothing changes >>> } >>> } >>> ## Did we exit the last clear range? >>> if(defined($rangeFlag)){ >>> my $i = scalar(@$qual); >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> } >>> >>> unless(@ranges){ >>> die "There is no clear range... I don't know what to do here!\n"; >>> } >>> >>> print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> my $sum; map {$sum += $_->[2]} @ranges; >>> >>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>> "quality scores above the given threshold\n"; >>> >>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> if($_->[2]/$sum < 0.5){ >>> warn "not so much a clear range as a clear chunk...\n"; >>> } >>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>> $_->[1]+1), >>> -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ); >>> } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kaboroev at sfu.ca Mon Apr 27 15:04:05 2009 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Mon, 27 Apr 2009 12:04:05 -0700 Subject: [Bioperl-l] Bio::Graphics Sub Feature Title Message-ID: <49F601A5.8090205@sfu.ca> Hi, I was wondering if it is possible to set a different "-title" for each of the subfeatures in a track the same way one can set a different "-bgcolor" using a subroutine. I noticed that the -title subroutine is only called once per Feature and is passed a "Bio::SeqFeature::Generic" class whereas the -bgcolor subroutine is called once per Sub Feature and is passed the "Bio::SeqFeature::Generic"s which I created. Is there any way for the -title subroutine to be called each Sub Feature or is this not implemented? Keith From dan.bolser at gmail.com Tue Apr 28 01:46:05 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 28 Apr 2009 06:46:05 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0904272246q56e19a2dr542b29f2378d0a48@mail.gmail.com> 2009/4/27 Mark A. Jensen : > Dan - congrats on your first contribution! Mark I don't really feel like I can take much credit! Thanks Heikki! I'll look at what you did and see what I can add. Its a really good feeling to contribute to BioPerl (even if I didn't really do much!)... Now... where do I collect my cheque? ;-) Seriously though, thanks all for helping to put this together, and thanks for maintaining BioPerl and keeping it relevant as the field changes. All the best, Dan. > ----- Original Message ----- From: "Dan Bolser" > To: "Heikki Lehvaslaiho" > Cc: "Chris Fields" ; > Sent: Monday, April 27, 2009 4:31 AM > Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? > > > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be > implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method _find_clean_ranges(). That >> method calculates all the ranges and stores them internally (as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> my $newqual = $qual->get_clean_range() >> # do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> # do some reporting and logging >> } else { # more than one ranges >> my @quals = $qual->get_all_clean_ranges($min_lenght); >> # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> -Heikki >> >> 2009/4/24 Chris Fields : >>> >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title : get_clear_range >>>> >>>> Title : subqual >>>> Usage : $subobj = $obj->get_clear_range(); >>>> $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> cutoff or a default value of 13. >>>> >>>> Returns : a new Bio::Seq::Quality object >>>> Args : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> my $self = shift; >>>> my $qual = $self->qual; >>>> my $minQual = shift || 13; >>>> >>>> my (@ranges, $rangeFlag); >>>> >>>> for(my $i=0; $i<@$qual; $i++){ >>>> ## Are we currently within a clear range or not? >>>> if(defined($rangeFlag)){ >>>> ## Did we just leave the clear range? >>>> if($qual->[$i]<$minQual){ >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ## and reset the range flag. >>>> $rangeFlag = undef; >>>> } >>>> ## else nothing changes >>>> } >>>> else{ >>>> ## Did we just enter a clear range? >>>> if($qual->[$i]>=$minQual){ >>>> ## Better set the range flag! >>>> $rangeFlag = $i; >>>> } >>>> ## else nothing changes >>>> } >>>> } >>>> ## Did we exit the last clear range? >>>> if(defined($rangeFlag)){ >>>> my $i = scalar(@$qual); >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> } >>>> >>>> unless(@ranges){ >>>> die "There is no clear range... I don't know what to do here!\n"; >>>> } >>>> >>>> print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> "quality scores above the given threshold\n"; >>>> >>>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> if($_->[2]/$sum < 0.5){ >>>> warn "not so much a clear range as a clear chunk...\n"; >>>> } >>>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>>> $_->[1]+1), >>>> -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ); >>>> } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From brianli.cas at gmail.com Tue Apr 28 23:14:23 2009 From: brianli.cas at gmail.com (brian li) Date: Wed, 29 Apr 2009 11:14:23 +0800 Subject: [Bioperl-l] Parse problem of a big EMBL entry Message-ID: Hi everyone, Here is greeting from Brian. I have just began to use bioperl 1.6.0 to collect certain data lines from EMBL files. There's a problem when I try to get an entry that includes over 1 million lines. A call of Bio::SeqIO::embl->next_seq would just cause the parser script to exit. I have read Bio/SeqIO/embl.pm and I think one possible way to solve the problem may be to give my script more memory to store the entry data. The machine I am using has 32GB memory, and that shall be enough for any entry. So I am wondering whether there is any way to set the size of the memory available to a perl script. Others ways to deal with the problem are also welcome. Appreciate your help. Brian From jason at bioperl.org Wed Apr 29 01:10:27 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 28 Apr 2009 22:10:27 -0700 Subject: [Bioperl-l] Parse problem of a big EMBL entry In-Reply-To: References: Message-ID: <2154C145-1A66-4EEB-B99E-FBE8215539F5@bioperl.org> Brian - Without memory leaks it should only take up as much memory as the current sequence you have parsed. If you mean you have a sequence record with > 1M lines I'm not sure how much memory that would take up, depends on if this is lots of feature or what. There are ways to tell BioPerl to throw away things you don't want to parse out from the record. See http://bioperl.org/wiki/HOWTO:SeqIO#Speed. 2C_Bio::Seq::SeqBuilder Perl will use as much memory as is available on your machine. Have you monitored the memory use of the perl running to insure it is reaching the 32Gb limit and that is in fact what is killing the program? -jason On Apr 28, 2009, at 8:14 PM, brian li wrote: > Hi everyone, > > Here is greeting from Brian. > > I have just began to use bioperl 1.6.0 to collect certain data > lines from EMBL files. > > There's a problem when I try to get an entry that includes over 1 > million lines. A call of Bio::SeqIO::embl->next_seq would just cause > the parser script to exit. I have read Bio/SeqIO/embl.pm and I think > one possible way to solve the problem may be to give my script more > memory to store the entry data. The machine I am using has 32GB > memory, and that shall be enough for any entry. > > So I am wondering whether there is any way to set the size of the > memory available to a perl script. Others ways to deal with the > problem are also welcome. > > Appreciate your help. > > Brian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From paola.bisignano at gmail.com Wed Apr 29 10:08:57 2009 From: paola.bisignano at gmail.com (Paola Bisignano) Date: Wed, 29 Apr 2009 16:08:57 +0200 Subject: [Bioperl-l] parsing /www.ebi.ac.uk/pdbsum/ Message-ID: Hi, thanks for accepting me in the mailing list, I'm Paola and I work in the institute of cancer in Genoa, Italy, as a bioinformatic...I'm biologist, quite new in perl...(2 months) and never used bioperl...because I prefer learning a little perl before, but now parsing, parsing, and parsing bioinformatic web sites....I need Bioperl :-) I visited www.bioperl.org and read tutorials, I read about a lot of moduls used to parse different web site. I need to parse one in particular EMBL-EBI http://www.ebi.ac.uk/pdbsum/ that is different from EMBL because there are also other information protein-ligand interaction....I never used bioperl moduls...and parsed by myself...but If the receptor has more ligands...it is more difficult to parse...to choose which ligands I need because there are "false" ligands as ions or glycerol that I don't need but I don't know the synthax of this source...for everything can be seen as a ligand....so I want to know if there are moduls that I can use to do my analysis...if anyone can help me...is very wellcome... Thanks From jason at bioperl.org Wed Apr 29 12:41:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 29 Apr 2009 09:41:02 -0700 Subject: [Bioperl-l] Fwd: Parse problem of a big EMBL entry References: Message-ID: Brian - please always CC the mailing list on replies. Not sure what is causing the seg fault so I can't really help here - if you want to file it as a bug at the bugzilla with instructions on how to reproduce it will hopefully get looked at. -jason Begin forwarded message: > From: brian li > Date: April 29, 2009 1:23:32 AM PDT > To: Jason Stajich > Subject: Re: [Bioperl-l] Parse problem of a big EMBL entry > > Hi Jason, > >> Without memory leaks it should only take up as much memory as the >> current >> sequence you have parsed. If you mean you have a sequence record >> with > 1M >> lines I'm not sure how much memory that would take up, depends on >> if this is >> lots of feature or what. > > Lots of feature. > >> There are ways to tell BioPerl to throw away >> things you don't want to parse out from the record. See >> http://bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > Thanks. I think this would help. > >> Perl will use as much memory as is available on your machine. Have >> you >> monitored the memory use of the perl running to insure it is >> reaching the >> 32Gb limit and that is in fact what is killing the program? > > I monitored the memory usage in my last run. The size of free > memory didn't change a lot, and remained to be around 20GB (buffer > size added). I took the wrong assumption. Thanks again for your hint. > > BTW: The message I get when I parse big million-line entry is > "Segmentation fault". Not familiar with this and trying to get a clue. > > Brian Jason Stajich jason at bioperl.org From razi.khaja at gmail.com Wed Apr 29 15:08:14 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 15:08:14 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> Message-ID: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Hello, I am generating BLAST alignments using the BLAST URL API from NCBI. I want to parse details from BLAST reports whenever there are "Features in/flanking this part of subject sequence".? A portion of the BLAST report showing "Features flanking ..." is pasted below. I am using Bio::SearchIO to parse details.? The relevant part of the script is below. The problem I am having is that for some reason the first occurrence of a "Feature flanking this part of a subject sequence" is skipped. I am only able to parse/print all occurrences of a "Feature in/flanking this part of a subject sequence" from the second occurrence to the last occurrence. I believe the code responsible for parsing this information is in Bio/SearchIO/blast.pm, starting on line 760. I have tried fixing the code in Bio/SearchIO/blast.pm myself but was not able to correct the problem. Would it be possible for someone to fix the code in the Bio/SearchIO/blast.pm module, or help me fix the code so that the first occurrence is not skipped? Thanks, Razi ===== The part of the script that is relevant to parsing "Features in/flanking..." ==== my $bio_searchio_in = Bio::SearchIO->new( ??? -file?? => 'blast_result.txt', ??? -format => 'blast' ); my $i = 1; while( my $result = $bio_searchio_in->next_result() ){ ??? while( my $hit = $result->next_hit() ){ ??????? while( my $hsp = $hit->next_hsp() ){ ??????????? my $hsp_features = $hsp->hit_features(); ??????????? if( $hsp_features ) { ??????????????? print "HSP FEATURE $i\t$hsp_features\n"; ??????????????? $i++; ??????????? } ??????? } ??? } } ===== A portion of a BLAST report with "Features flanking ..." ===== ... ... ?Score = 54.7 bits (29),? Expect = 0.003 ?Identities = 29/29 (100%), Gaps = 0/29 (0%) ?Strand=Plus/Minus Query? 6556???? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6584 ??????????????? ||||||||||||||||||||||||||||| Sbjct? 6953042? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6953014 >gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic contig Length=237250 ?Features flanking this part of subject sequence: ?? 16338 bp at 5' side: PRAME family member 8 ?? 11926 bp at 3' side: PRAME family member 9 ?Score = 7286 bits (3945),? Expect = 0.0 ?Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) ?Strand=Plus/Plus Query? 23225? GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG 23284 ????????????? |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| Sbjct? 86128? GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG 86187 Query? 23285? GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA 23344 ????????????? ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| Sbjct? 86188? GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA 86247 ... ... From cjfields at illinois.edu Wed Apr 29 15:41:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Apr 2009 14:41:54 -0500 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Message-ID: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> I'm assuming this is from an older bioperl; this data should be accessible via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl 1.6.0 in CPAN). chris On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > Hello, > > I am generating BLAST alignments using the BLAST URL API from NCBI. > > I want to parse details from BLAST reports whenever there are > "Features in/flanking this part of subject sequence". A portion of > the BLAST report showing "Features flanking ..." is pasted below. > > I am using Bio::SearchIO to parse details. The relevant part of the > script is below. > > The problem I am having is that for some reason the first occurrence > of a "Feature flanking this part of a subject sequence" is skipped. > I am only able to parse/print all occurrences of a "Feature > in/flanking this part of a subject sequence" from the second > occurrence to the last occurrence. > > I believe the code responsible for parsing this information is in > Bio/SearchIO/blast.pm, starting on line 760. > I have tried fixing the code in Bio/SearchIO/blast.pm myself but was > not able to correct the problem. > Would it be possible for someone to fix the code in the > Bio/SearchIO/blast.pm module, or help me fix the code so that the > first occurrence is not skipped? > > Thanks, > Razi > ===== The part of the script that is relevant to parsing "Features > in/flanking..." ==== > my $bio_searchio_in = Bio::SearchIO->new( > -file => 'blast_result.txt', > -format => 'blast' > ); > > my $i = 1; > while( my $result = $bio_searchio_in->next_result() ){ > while( my $hit = $result->next_hit() ){ > while( my $hsp = $hit->next_hsp() ){ > my $hsp_features = $hsp->hit_features(); > if( $hsp_features ) { > print "HSP FEATURE $i\t$hsp_features\n"; > $i++; > } > } > } > } > > ===== A portion of a BLAST report with "Features flanking ..." ===== > ... > ... > Score = 54.7 bits (29), Expect = 0.003 > Identities = 29/29 (100%), Gaps = 0/29 (0%) > Strand=Plus/Minus > > Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 > ||||||||||||||||||||||||||||| > Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 > > >> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 >> genomic contig > Length=237250 > > Features flanking this part of subject sequence: > 16338 bp at 5' side: PRAME family member 8 > 11926 bp at 3' side: PRAME family member 9 > > Score = 7286 bits (3945), Expect = 0.0 > Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) > Strand=Plus/Plus > > Query 23225 > GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG > 23284 > |||||||||||||||||||||||||||||||| |||||| ||||||||||| > |||||||| > Sbjct 86128 > GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG > 86187 > > Query 23285 > GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA > 23344 > ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| > ||||| > Sbjct 86188 > GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA > 86247 > ... > ... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjm at berkeleybop.org Wed Apr 29 16:58:15 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Wed, 29 Apr 2009 13:58:15 -0700 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> The .ontology files have been deprecated by GO. Use the .obo files instead. It appears the bioperl parser for the .ontology files isn't able to deal with the new relations in GO. I suggest that the bioperl .ontology parser is deprecated too On Apr 22, 2009, at 6:38 AM, Hilmar Lapp wrote: > Hi Carlos, > > I am moving your inquiry to the BioPerl list, as the tool is a part > of Bioperl-db and uses BioPerl for parsing the ontologies. > > In your case, the goflat parser in BioPerl seems to balk at the > second one of the input files. It may be that the input file is > (was?) corrupted, that does happen every once in a while. More > likely though is that the goflat parser hasn't kept up with some > format changes. Have you tried using the obo format version instead? > > -hilmar > > On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > >> Hi guys >> >> I'm working with biosql and I try to figure out how to load >> ontologies into biosql. >> >> I've tried >> >> load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- >> host localhost --dbname biosql --namespace "Gene Ontology" --format >> goflat --fmtargs "-defs_file,GO.defs" function.ontology >> process.ontology component.ontology >> >> as in the script info but I have an error, >> >> >> ------------------- WARNING --------------------- >> MSG: DBLink exists in the dblink of _default >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: format error (file process.ontology) offending line: >> -negative regulation of angiogenesis ; GO:0016525 ; synonym:down >> regulation of angiogenesis ; synonym:down\-regulation of >> angiogenesis ; synonym:downregulation of angiogenesis ; >> synonym:inhibition of angiogenesis % negative regulation of >> developmental process ; GO:0051093 % regulation of angiogenesis ; >> GO:0045765 >> >> STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ >> perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 >> STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ >> Bio/OntologyIO/dagflat.pm:284 >> STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ >> 5.10.0/Bio/OntologyIO/dagflat.pm:317 >> STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ >> load_ontology.pl:604 >> ------------------------------------- >> >> Any suggestion? >> >> Cheers, >> >> Carlos >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Apr 29 19:48:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 29 Apr 2009 19:48:10 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com><62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <7A9746282BA343F78423D12DB1578509@NewLife> also check out http://www.bioperl.org/wiki/Parsing_BLAST_HSPs MAJ ----- Original Message ----- From: "Chris Fields" To: "Razi Khaja" Cc: Sent: Wednesday, April 29, 2009 3:41 PM Subject: Re: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl > 1.6.0 in CPAN). > > chris > > On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > >> Hello, >> >> I am generating BLAST alignments using the BLAST URL API from NCBI. >> >> I want to parse details from BLAST reports whenever there are >> "Features in/flanking this part of subject sequence". A portion of >> the BLAST report showing "Features flanking ..." is pasted below. >> >> I am using Bio::SearchIO to parse details. The relevant part of the >> script is below. >> >> The problem I am having is that for some reason the first occurrence >> of a "Feature flanking this part of a subject sequence" is skipped. >> I am only able to parse/print all occurrences of a "Feature >> in/flanking this part of a subject sequence" from the second >> occurrence to the last occurrence. >> >> I believe the code responsible for parsing this information is in >> Bio/SearchIO/blast.pm, starting on line 760. >> I have tried fixing the code in Bio/SearchIO/blast.pm myself but was >> not able to correct the problem. >> Would it be possible for someone to fix the code in the >> Bio/SearchIO/blast.pm module, or help me fix the code so that the >> first occurrence is not skipped? >> >> Thanks, >> Razi > > > >> ===== The part of the script that is relevant to parsing "Features >> in/flanking..." ==== >> my $bio_searchio_in = Bio::SearchIO->new( >> -file => 'blast_result.txt', >> -format => 'blast' >> ); >> >> my $i = 1; >> while( my $result = $bio_searchio_in->next_result() ){ >> while( my $hit = $result->next_hit() ){ >> while( my $hsp = $hit->next_hsp() ){ >> my $hsp_features = $hsp->hit_features(); >> if( $hsp_features ) { >> print "HSP FEATURE $i\t$hsp_features\n"; >> $i++; >> } >> } >> } >> } >> >> ===== A portion of a BLAST report with "Features flanking ..." ===== >> ... >> ... >> Score = 54.7 bits (29), Expect = 0.003 >> Identities = 29/29 (100%), Gaps = 0/29 (0%) >> Strand=Plus/Minus >> >> Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 >> ||||||||||||||||||||||||||||| >> Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 >> >> >>> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic >>> contig >> Length=237250 >> >> Features flanking this part of subject sequence: >> 16338 bp at 5' side: PRAME family member 8 >> 11926 bp at 3' side: PRAME family member 9 >> >> Score = 7286 bits (3945), Expect = 0.0 >> Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) >> Strand=Plus/Plus >> >> Query 23225 GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG >> 23284 >> |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| >> Sbjct 86128 GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG >> 86187 >> >> Query 23285 GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA >> 23344 >> ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| >> Sbjct 86188 GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA >> 86247 >> ... >> ... >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Apr 29 20:31:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Apr 2009 12:31:06 +1200 Subject: [Bioperl-l] waaaay off topic question In-Reply-To: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> References: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C84151@exchsth.agresearch.co.nz> I have a question that's nothing to do with BioPerl or Perl, but hope there's a chance that some of you clever people may be doing the same thing as me :-) I've been asked to write some VB scripts to control Applied Biosystems "Analyst QS" and "BioAnalyst" applications for analyzing mass-spec data. There's limited documentation (10yr out of date) with some example code (that doesn't compile) so I'm not getting as far along as I'd like. Has anyone worked with this stuff before? Any assistance greatly appreciated !!! Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From razi.khaja at gmail.com Wed Apr 29 23:57:17 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 23:57:17 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <62e9dabc0904292057y6b725e0yc3b0a85c661c44f8@mail.gmail.com> Hello Chris, I am using bioperl 1.6.0. It may be a few weeks before I can upgrade to bioperl-live from svn, and so it may be a few weeks before I can return to my question. When I do upgrade, I will report back to this thread if I still encounter problems. Razi On Wed, Apr 29, 2009 at 3:41 PM, Chris Fields wrote: > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in > bioperl 1.6.0 in CPAN). > > chris > > > From jonathanmflowers at gmail.com Thu Apr 30 12:40:42 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:40:42 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem Message-ID: <23319982.post@talk.nabble.com> Dear colleagues, I have set up a mySQL database and loaded a GFF3 and fasta file using Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be working normally except when I attempt to create a Bio::DB::SeqFeature::Segment object. The following works as expected: my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:foo', -user => 'myuser', -pass => 'mypassword', -write => '1'); my @features = $db->features(-seq_id=>'chr1', -start=>1, -end=>10000, -types=>['gene']); However, when I try to create a segment object using either of the two following method calls I get an error: my $segment = $db->segment('chr1',1=>10000); my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => '10000'); -------------------------------- EXCEPTION ------------------------------------ MSG: segment() called in a scalar context but multiple features match. Either call in a list context or narrow your search using the -types or -class arguments STACK Bio::DB::SeqFeature::Store::segment /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 STACK toplevel trial.pl:42 ------------------------------------------------------- Calling in list context (which is not defined in the documentation) produces an array of 22 identical scalars = 'chr1:1..10000'. Any ideas? Thanks Jonathan -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Apr 30 12:52:24 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:52:24 -0700 (PDT) Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23264714.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> Message-ID: <23320232.post@talk.nabble.com> Hi, There is no means to do this in bioperl, but it is simple to make a system call and execute an MSA program such as MUSCLE to align fasta-formatted sequences using something like... qx(muscle -in $infilename -out $outfilename) Jonathan laxmanb wrote: > > I need to create a multiple sequence alignment of some sequences using > CLUSTALW or any other Multiple sequence alignment program. However, I've > learnt that this functionality used to be UNIX/Linux only. However, the > documentation is also very old, so I'd like to know if any CLUSTAL/ any > other MSA programs can be run using BioPerl on Windows. > > Thank you for your time :) > -- View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Thu Apr 30 13:04:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Apr 2009 12:04:46 -0500 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <92920FDD-7CB2-4331-9860-87304E16C948@illinois.edu> I don't recall this being a UNIX-only issue, though admittedly it's been years since I've tried running the bioperl-run modules on WinXP. I do recall getting BLAST, EMBOSS and others to work though; I don't see why ClustalW would be much different. Have you actually tested this out and found a problem? Have you tried cygwin? chris On Apr 30, 2009, at 11:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Apr 30 13:29:29 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:29:29 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <23319982.post@talk.nabble.com> References: <23319982.post@talk.nabble.com> Message-ID: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> One would have to see some of your GFF to know better. It sounds like you have chr1 defined in multiple places. Did you use the bp_seqfeature_load script to load the data in one go - it should catch it if you have non-unique IDs. -jason On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > > Dear colleagues, > > I have set up a mySQL database and loaded a GFF3 and fasta file using > Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be > working > normally except when I attempt to create a > Bio::DB::SeqFeature::Segment > object. > > The following works as expected: > > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:foo', > -user => 'myuser', > -pass => 'mypassword', > -write => '1'); > > my @features = $db->features(-seq_id=>'chr1', > -start=>1, > -end=>10000, > -types=>['gene']); > > However, when I try to create a segment object using either of the two > following method calls I get an error: > > my $segment = $db->segment('chr1',1=>10000); > > my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => > '10000'); > > -------------------------------- EXCEPTION > ------------------------------------ > > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types > or > -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 > STACK toplevel trial.pl:42 > ------------------------------------------------------- > > Calling in list context (which is not defined in the documentation) > produces > an array of 22 identical scalars = 'chr1:1..10000'. > > Any ideas? > > Thanks > > Jonathan > > -- > View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Apr 30 13:31:19 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:31:19 -0700 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <734F5ADF-77F5-4AA5-A676-79B42B3C54CB@bioperl.org> the bioperl-run module of Bio::Tools::Run::Alignment::Clustalw or MUSCLE ones don't work then? They do the cmdline work for you. On Apr 30, 2009, at 9:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 15:27:15 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 12:27:15 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion Message-ID: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> So, I'm parsing Genbank sequences to pull out the various exons. I found the way to get the NCBI Exon number from each feature, but am confused about one of the methods. When I do annotation->as_text I'm expecting to get back 1 from the feature, but instead get back Value: 1 ??!? Why is the value from the NCBI file getting that text tagged onto it? http://www.ncbi.nlm.nih.gov/nuccore/73622129 exon 1..774 /gene="BOLA2" /gene_synonym="BOLA2A; My016" /inference="alignment:Splign" /number=1 print ($f->annotation->get_Annotations('number'))[0]->as_text; Value: 1 From SMarkel at accelrys.com Thu Apr 30 15:56:40 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 30 Apr 2009 15:56:40 -0400 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Kevin, I believe the extra text was added for readability when printing to the console. In our code we just add the following post- processing step. (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 30 April 2009 12:27 PM > To: BioPerl List > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > So, I'm parsing Genbank sequences to pull out the various exons. I found > the way to get the NCBI Exon number from each feature, but am confused > about one of the methods. When I do annotation->as_text I'm expecting to > get back 1 from the feature, but instead get back Value: 1 ??!? Why is > the value from the NCBI file getting that text tagged onto it? > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > exon 1..774 > /gene="BOLA2" > /gene_synonym="BOLA2A; My016" > /inference="alignment:Splign" > /number=1 > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > Value: 1 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Thu Apr 30 16:01:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:01:03 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Message-ID: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> That's nice in some regards, but makes it hard to use the function in code without having to always process the result, which seems to be counter to what one would expect. E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". Is there a better way to get the number directly without having to strip off the text that never existed in the first place? > -----Original Message----- > From: Scott Markel [mailto:SMarkel at accelrys.com] > Sent: Thursday, April 30, 2009 12:57 PM > To: Kevin Brown; BioPerl List > Subject: RE: Bio::Annotations::Collection confusion > > Kevin, > > I believe the extra text was added for readability when printing > to the console. In our code we just add the following post- > processing step. > > (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > Sent: Thursday, 30 April 2009 12:27 PM > > To: BioPerl List > > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > > > So, I'm parsing Genbank sequences to pull out the various > exons. I found > > the way to get the NCBI Exon number from each feature, but > am confused > > about one of the methods. When I do annotation->as_text I'm > expecting to > > get back 1 from the feature, but instead get back Value: 1 > ??!? Why is > > the value from the NCBI file getting that text tagged onto it? > > > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > exon 1..774 > > /gene="BOLA2" > > /gene_synonym="BOLA2A; My016" > > /inference="alignment:Splign" > > /number=1 > > > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > > Value: 1 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jonathanmflowers at gmail.com Thu Apr 30 16:22:23 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 13:22:23 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> References: <23319982.post@talk.nabble.com> <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> Message-ID: <23322607.post@talk.nabble.com> Jason, I used the Bio::DB::SeqFeature::Store::GFF3Loader rather than the bp_seqfeature_load.pl script. You were right, however. It looks like I had populated the MySQL database with multiple fasta files. I cleared the database, ran the GFF3Loader twice (once for the fasta, once for the GFF3). Segment objects are appear to be working fine now. THANKS! Jonathan Jason Stajich-3 wrote: > > One would have to see some of your GFF to know better. It sounds like > you have chr1 defined in multiple places. > > Did you use the bp_seqfeature_load script to load the data in one go - > it should catch it if you have non-unique IDs. > > -jason > On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > >> >> Dear colleagues, >> >> I have set up a mySQL database and loaded a GFF3 and fasta file using >> Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be >> working >> normally except when I attempt to create a >> Bio::DB::SeqFeature::Segment >> object. >> >> The following works as expected: >> >> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >> -dsn => 'dbi:mysql:foo', >> -user => 'myuser', >> -pass => 'mypassword', >> -write => '1'); >> >> my @features = $db->features(-seq_id=>'chr1', >> -start=>1, >> -end=>10000, >> -types=>['gene']); >> >> However, when I try to create a segment object using either of the two >> following method calls I get an error: >> >> my $segment = $db->segment('chr1',1=>10000); >> >> my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => >> '10000'); >> >> -------------------------------- EXCEPTION >> ------------------------------------ >> >> MSG: segment() called in a scalar context but multiple features match. >> Either call in a list context or narrow your search using the -types >> or >> -class arguments >> >> STACK Bio::DB::SeqFeature::Store::segment >> /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 >> STACK toplevel trial.pl:42 >> ------------------------------------------------------- >> >> Calling in list context (which is not defined in the documentation) >> produces >> an array of 22 identical scalars = 'chr1:1..10000'. >> >> Any ideas? >> >> Thanks >> >> Jonathan >> >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23322607.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Thu Apr 30 16:24:25 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 13:24:25 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> Message-ID: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Seems like you just want $annotation->value ? =head2 as_text Title : as_text Usage : my $text = $obj->as_text Function: return the string "Value: $v" where $v is the value Returns : string Args : none =cut =head2 display_text Title : display_text Usage : my $str = $ann->display_text(); Function: returns a string. Unlike as_text(), this method returns a string formatted as would be expected for te specific implementation. One can pass a callback as an argument which allows custom text generation; the callback is passed the current instance and any text returned Example : Returns : a string Args : [optional] callback =cut =head2 value Title : value Usage : $obj->value($newval) Function: Get/Set the value for simplevalue Returns : value of value Args : newvalue (optional) =cut On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > That's nice in some regards, but makes it hard to use the function in > code without having to always process the result, which seems to be > counter to what one would expect. > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > Is there a better way to get the number directly without having to > strip > off the text that never existed in the first place? > >> -----Original Message----- >> From: Scott Markel [mailto:SMarkel at accelrys.com] >> Sent: Thursday, April 30, 2009 12:57 PM >> To: Kevin Brown; BioPerl List >> Subject: RE: Bio::Annotations::Collection confusion >> >> Kevin, >> >> I believe the extra text was added for readability when printing >> to the console. In our code we just add the following post- >> processing step. >> >> (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Co-chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >>> Sent: Thursday, 30 April 2009 12:27 PM >>> To: BioPerl List >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion >>> >>> So, I'm parsing Genbank sequences to pull out the various >> exons. I found >>> the way to get the NCBI Exon number from each feature, but >> am confused >>> about one of the methods. When I do annotation->as_text I'm >> expecting to >>> get back 1 from the feature, but instead get back Value: 1 >> ??!? Why is >>> the value from the NCBI file getting that text tagged onto it? >>> >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 >>> exon 1..774 >>> /gene="BOLA2" >>> /gene_synonym="BOLA2A; My016" >>> /inference="alignment:Splign" >>> /number=1 >>> >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; >>> Value: 1 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 16:45:29 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:45:29 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> OK. Can't see that method in the Deobfuscator which might explain why I didn't know about it. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A %3AAnnotation%3A%3ACollection > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > Behalf Of Jason Stajich > Sent: Thursday, April 30, 2009 1:24 PM > To: Kevin Brown > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > Seems like you just want $annotation->value ? > > > =head2 as_text > > Title : as_text > Usage : my $text = $obj->as_text > Function: return the string "Value: $v" where $v is the value > Returns : string > Args : none > > > =cut > > =head2 display_text > > Title : display_text > Usage : my $str = $ann->display_text(); > Function: returns a string. Unlike as_text(), this method > returns a > string > formatted as would be expected for te specific > implementation. > > One can pass a callback as an argument which > allows custom > text > generation; the callback is passed the current instance > and any text > returned > Example : > Returns : a string > Args : [optional] callback > > =cut > > =head2 value > > Title : value > Usage : $obj->value($newval) > Function: Get/Set the value for simplevalue > Returns : value of value > Args : newvalue (optional) > > > =cut > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > That's nice in some regards, but makes it hard to use the > function in > > code without having to always process the result, which seems to be > > counter to what one would expect. > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > Is there a better way to get the number directly without having to > > strip > > off the text that never existed in the first place? > > > >> -----Original Message----- > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > >> Sent: Thursday, April 30, 2009 12:57 PM > >> To: Kevin Brown; BioPerl List > >> Subject: RE: Bio::Annotations::Collection confusion > >> > >> Kevin, > >> > >> I believe the extra text was added for readability when printing > >> to the console. In our code we just add the following post- > >> processing step. > >> > >> (my $text = $annotation->as_text()) =~ > s/(Comment|Value): //; > >> > >> Scott > >> > >> Scott Markel, Ph.D. > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > >> San Diego, CA 92121 fax: +1 858 799 5222 > >> USA web: http://www.accelrys.com > >> > >> http://www.linkedin.com/in/smarkel > >> Vice President, Board of Directors: > >> International Society for Computational Biology > >> Co-chair: ISCB Publications Committee > >> Associate Editor: PLoS Computational Biology > >> Editorial Board: Briefings in Bioinformatics > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > >>> Sent: Thursday, 30 April 2009 12:27 PM > >>> To: BioPerl List > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > >>> > >>> So, I'm parsing Genbank sequences to pull out the various > >> exons. I found > >>> the way to get the NCBI Exon number from each feature, but > >> am confused > >>> about one of the methods. When I do annotation->as_text I'm > >> expecting to > >>> get back 1 from the feature, but instead get back Value: 1 > >> ??!? Why is > >>> the value from the NCBI file getting that text tagged onto it? > >>> > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > >>> exon 1..774 > >>> /gene="BOLA2" > >>> /gene_synonym="BOLA2A; My016" > >>> /inference="alignment:Splign" > >>> /number=1 > >>> > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > >>> Value: 1 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > From Russell.Smithies at agresearch.co.nz Thu Apr 30 17:28:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 1 May 2009 09:28:39 +1200 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C843A2@exchsth.agresearch.co.nz> It's buried in Bio::Annotation::SimpleValue I think http://bioperl.org/cgi-bin/deob_interface.cgi?Search=&module=&sort_order=by+method&search_string=Bio%3A%3AAnnotation%3A%3ASimpleValue&Filter=Submit+Query > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Friday, 1 May 2009 8:45 a.m. > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > OK. Can't see that method in the Deobfuscator which might explain why I > didn't know about it. > > http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 > A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A > %3AAnnotation%3A%3ACollection > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > > Behalf Of Jason Stajich > > Sent: Thursday, April 30, 2009 1:24 PM > > To: Kevin Brown > > Cc: BioPerl List > > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > > > Seems like you just want $annotation->value ? > > > > > > =head2 as_text > > > > Title : as_text > > Usage : my $text = $obj->as_text > > Function: return the string "Value: $v" where $v is the value > > Returns : string > > Args : none > > > > > > =cut > > > > =head2 display_text > > > > Title : display_text > > Usage : my $str = $ann->display_text(); > > Function: returns a string. Unlike as_text(), this method > > returns a > > string > > formatted as would be expected for te specific > > implementation. > > > > One can pass a callback as an argument which > > allows custom > > text > > generation; the callback is passed the current instance > > and any text > > returned > > Example : > > Returns : a string > > Args : [optional] callback > > > > =cut > > > > =head2 value > > > > Title : value > > Usage : $obj->value($newval) > > Function: Get/Set the value for simplevalue > > Returns : value of value > > Args : newvalue (optional) > > > > > > =cut > > > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > > > That's nice in some regards, but makes it hard to use the > > function in > > > code without having to always process the result, which seems to be > > > counter to what one would expect. > > > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > > > Is there a better way to get the number directly without having to > > > strip > > > off the text that never existed in the first place? > > > > > >> -----Original Message----- > > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > > >> Sent: Thursday, April 30, 2009 12:57 PM > > >> To: Kevin Brown; BioPerl List > > >> Subject: RE: Bio::Annotations::Collection confusion > > >> > > >> Kevin, > > >> > > >> I believe the extra text was added for readability when printing > > >> to the console. In our code we just add the following post- > > >> processing step. > > >> > > >> (my $text = $annotation->as_text()) =~ > > s/(Comment|Value): //; > > >> > > >> Scott > > >> > > >> Scott Markel, Ph.D. > > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > > >> San Diego, CA 92121 fax: +1 858 799 5222 > > >> USA web: http://www.accelrys.com > > >> > > >> http://www.linkedin.com/in/smarkel > > >> Vice President, Board of Directors: > > >> International Society for Computational Biology > > >> Co-chair: ISCB Publications Committee > > >> Associate Editor: PLoS Computational Biology > > >> Editorial Board: Briefings in Bioinformatics > > >> > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > >>> Sent: Thursday, 30 April 2009 12:27 PM > > >>> To: BioPerl List > > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > >>> > > >>> So, I'm parsing Genbank sequences to pull out the various > > >> exons. I found > > >>> the way to get the NCBI Exon number from each feature, but > > >> am confused > > >>> about one of the methods. When I do annotation->as_text I'm > > >> expecting to > > >>> get back 1 from the feature, but instead get back Value: 1 > > >> ??!? Why is > > >>> the value from the NCBI file getting that text tagged onto it? > > >>> > > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > >>> exon 1..774 > > >>> /gene="BOLA2" > > >>> /gene_synonym="BOLA2A; My016" > > >>> /inference="alignment:Splign" > > >>> /number=1 > > >>> > > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > > >>> Value: 1 > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason at bioperl.org > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Kevin.M.Brown at asu.edu Thu Apr 30 17:56:16 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 14:56:16 -0700 Subject: [Bioperl-l] Other object oddities Message-ID: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> So, I'm using quite a bit of bioperl code in my own stuff and have been seeing some oddities with the naming of methods. A good example would be in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method called "seq" but in the latter case it returns an object (and expects an object when doing a Set) and in the former it returns a string and expects a string when doing a Set. This makes for a bit of brain freeze on my part when the return from another object might be a Bio::Seq or Bio::SeqFeature::Generic and now calling the ->seq returns different things. Guess I'm just curious if anyone has done an audit of the methods of the various objects and their return types to see how consistent they are across even a subsection of the codebase? From maj at fortinbras.us Wed Apr 1 01:28:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 01:28:24 -0400 Subject: [Bioperl-l] #bioperl bot talk Message-ID: <2589D1BF1EA24C119C06982EB70F490C@NewLife> Hi All, Some cool stuff going on on the IRC node (freenode.net/#bioperl). Andrew Stewart has been prototyping an irc bot with Bioperl functionality built-in. The possibilities for improving support and logging our increasing irc traffic are terrifying. I've set up a wiki page (http://www.bioperl.org/wiki/Bots) under the new IRC category for discussions. Please feel free to contribute use cases, ideas, praise and blame. cheers, Mark From johann.pellet at inserm.fr Wed Apr 1 06:14:25 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 1 Apr 2009 12:14:25 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: Hi all, With the latest version of BioPerl and BioSQL, I have tried to insert entry from a GenBank file, which I have downloaded from the NCBI website (648 937 records) After successfully loading ncbi_taxonomy i am getting following error message while loading sequences into database. perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg -dbname biosql --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Human papillomavirus type 2c' (I was supplied 'Human papillomavirus - 2 | Alphapapillomavirus | Pa pillomaviridae') the script is not stopped until this entry: S67864 --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, values were ("1","19)","1","3") FKs (41914,) ERROR: invalid input syntax for integer: "19)" --------------------------------------------------- Could not store S67864: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::LocationAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ig nored until end of transaction block STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /Library/ Perl/5.8.8/Bio/DB/BioSQL/SeqFeatureAdaptor.pm:291 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:257 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 Any Idea? Thanks in advance Johann From florent.angly at gmail.com Wed Apr 1 13:03:28 2009 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 01 Apr 2009 10:03:28 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D39E60.1020103@gmail.com> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you won't be able to put its information in a hash (unless you have a lot of memory). Florent Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. > The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. > > It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the simplest way. > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> Sent: Wednesday, 1 April 2009 7:43 a.m. >> To: bioperl-l >> Subject: [Bioperl-l] taxonomy ID >> >> Hi All, >> I am writing a script, for one of its part i have to parse a blast >> report (refseq blast) and check how may organisms are eukaryotes and how >> namy of them are prokaryotes. >> I am using BIO::DB::taxinomy module: >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >> >> But for this i need a taxonomyid (like '33090') given in the example. >> So is it possible to get a taxonomyid from refseq balst report? >> If not then how i can deal with this problem? >> >> i would really appreciate if anyone can help me out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Wed Apr 1 13:15:48 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 19:15:48 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> Message-ID: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Hi all, I have a list of PUBMED IDs and I am trying to retrieve automatically the *full article* in any format (not just the abstract). Is there any method in bioperl that allows this? any other solution? Currently I am trying to solve this using WWW::Mechanize, but do you know of any other method to do this? Any help would be appreciated, Thanks in advance, M; From kanzure at gmail.com Wed Apr 1 14:18:22 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Wed, 1 Apr 2009 13:18:22 -0500 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <55ad6af70904011118q7cbdb05u9c89958de3ccc87e@mail.gmail.com> On Wed, Apr 1, 2009 at 12:15 PM, Miguel Pignatelli wrote: > I have a list of PUBMED IDs and I am trying to retrieve automatically the > *full article* in any format (not just the abstract). Is there any method in > bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you know of > any other method to do this? You can try pubget.com- it's a web gateway to download pubmedcentral articles. Unfortunately this means it does not have pubmed articles. What I have found with pubmed is that it's mainly a listing of abstracts, and then the various papers may or may not be online in their respective journals on the web somewhere else, and rarely are there any links to the publisher website. So how are you using WWW::Mechanize in this context? Is there some secret to attaining papers that are listed via pubmed? There's no magical links to the publisher websites .. so what's going on? - Bryan http://heybryan.org/ 1 512 203 0507 From Russell.Smithies at agresearch.co.nz Wed Apr 1 15:33:35 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:33:35 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5615@exchsth.agresearch.co.nz> There's always more than one way to do it. I have no trouble loading it into a hash but you could just grep the file: my(undef,$tax_id) = split("\s", `grep -w -P "^$accession" gi_taxid_prot.dmp`); --Russell > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Thursday, 2 April 2009 6:03 a.m. > To: Smithies, Russell > Cc: 'shalabh sharma'; 'bioperl-l' > Subject: Re: [Bioperl-l] taxonomy ID > > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: > > The taxonomy information isn't in the blast output unless you created custom > fasta headers for your blast database. > > The easiest way to get the tax_id for your accessions would be to download > the gi->tax_id list from > ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > > If you load that file into a hash, parse the accessions out of the blast > hits then lookup the tax_id from that hash, I think it should be fairly fast. > > > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a > separate problem :-) > > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file > contained within lists what division each tax_id belongs to (Bacteria, > Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out > from that. > > > > It's not a very BioPerly solution but sometimes just looking up the answer > from a file/table/hash is the simplest way. > > > > Hope this helps, > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >> Sent: Wednesday, 1 April 2009 7:43 a.m. > >> To: bioperl-l > >> Subject: [Bioperl-l] taxonomy ID > >> > >> Hi All, > >> I am writing a script, for one of its part i have to parse a > blast > >> report (refseq blast) and check how may organisms are eukaryotes and how > >> namy of them are prokaryotes. > >> I am using BIO::DB::taxinomy module: > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> But for this i need a taxonomyid (like '33090') given in the example. > >> So is it possible to get a taxonomyid from refseq balst report? > >> If not then how i can deal with this problem? > >> > >> i would really appreciate if anyone can help me out. > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From Russell.Smithies at agresearch.co.nz Wed Apr 1 15:48:02 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:48:02 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Not all articles have full-text at Pubmed but if you know the article ID, you can usually get the whole article (if available) like this: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez or as pdf http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf I'd just build a URL and use wget. If you're searching Pubmed directly, use a query like this to ensure you only get articles with links to full text: cancer AND (free full text[sb]) eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text[sb]) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > Sent: Thursday, 2 April 2009 6:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Hi all, > > I have a list of PUBMED IDs and I am trying to retrieve automatically > the *full article* in any format (not just the abstract). Is there any > method in bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you > know of any other method to do this? > > Any help would be appreciated, > > Thanks in advance, > > M; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From miguel.pignatelli at uv.es Wed Apr 1 18:14:13 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 2 Apr 2009 00:14:13 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: Thanks for the response, I have PMIDs extracted from Genbank flat files, is there a way to convert PMIDs to PMCIDs? I found this page: http://www.ncbi.nlm.nih.gov/sites/pmctopmid Is it possible to download the underlying conversion table for local use? Thank you very much in advance, M; El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > Not all articles have full-text at Pubmed but if you know the > article ID, you can usually get the whole article (if available) > like this: > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez > > or as pdf > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > I'd just build a URL and use wget. > > If you're searching Pubmed directly, use a query like this to ensure > you only get articles with links to full text: > > cancer AND (free full text[sb]) > eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text > [sb]) > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli >> Sent: Thursday, 2 April 2009 6:16 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles >> >> Hi all, >> >> I have a list of PUBMED IDs and I am trying to retrieve automatically >> the *full article* in any format (not just the abstract). Is there >> any >> method in bioperl that allows this? any other solution? >> Currently I am trying to solve this using WWW::Mechanize, but do you >> know of any other method to do this? >> >> Any help would be appreciated, >> >> Thanks in advance, >> >> M; >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > From Russell.Smithies at agresearch.co.nz Wed Apr 1 18:47:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 11:47:30 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5761@exchsth.agresearch.co.nz> Try this: http://www.pubmedcentral.nih.gov/about/ftp.html#Obtaining_DOIs Use ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz to associate PMC articles with a PMC ID, a PubMed ID, and the corresponding DOI. PMC-ids.csv.gz is a comma separated file with the following fields: * Journal Title * ISSN * Electronic ISSN * Publication Year * Volume * Issue * Page * DOI (if available) * PMC ID * PubMed ID (if available) * Manuscript ID (if available) * Release Date (Mmm DD YYYY or live) --Russell > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Thursday, 2 April 2009 11:14 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Thanks for the response, > > I have PMIDs extracted from Genbank flat files, is there a way to > convert PMIDs to PMCIDs? > I found this page: > > http://www.ncbi.nlm.nih.gov/sites/pmctopmid > > Is it possible to download the underlying conversion table for local > use? > > Thank you very much in advance, > > M; > > > El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > > > Not all articles have full-text at Pubmed but if you know the > > article ID, you can usually get the whole article (if available) > > like this: > > > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentr > ez > > > > or as pdf > > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > > > I'd just build a URL and use wget. > > > > If you're searching Pubmed directly, use a query like this to ensure > > you only get articles with links to full text: > > > > cancer AND (free full text[sb]) > > eg > http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free > +full+text > > [sb]) > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > >> Sent: Thursday, 2 April 2009 6:16 a.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > >> > >> Hi all, > >> > >> I have a list of PUBMED IDs and I am trying to retrieve automatically > >> the *full article* in any format (not just the abstract). Is there > >> any > >> method in bioperl that allows this? any other solution? > >> Currently I am trying to solve this using WWW::Mechanize, but do you > >> know of any other method to do this? > >> > >> Any help would be appreciated, > >> > >> Thanks in advance, > >> > >> M; > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > From tristan.lefebure at gmail.com Wed Apr 1 23:11:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 1 Apr 2009 23:11:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq Message-ID: <200904012311.51764.tristan.lefebure@gmail.com> Hi there, I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. Here is the description: Title : uniq_seq Usage : $aln->uniq_seq(): Remove identical sequences in in the alignment. Ambiguous base ("N", "n") and leading and ending gaps ("-") are NOT counted as differences. Function : Make a new alignment of unique sequence types (STs) Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") 2. ST of each sequence in STDERR Argument : None What I'm trying to obtain is the ST composition (i.e. what is supposed to go to STDERR), but I see nothing... An example: --------test.fasta: >seq1 AAATTTC >seq2 CAATTTC >seq3 AAATTTC ------- ----------test.pl: #! /usr/bin/perl use strict; use warnings; use Bio::AlignIO; use Bio::SimpleAlign; use Getopt::Long; my $in = Bio::AlignIO->new(-file => 'test.fasta' , -format => 'fasta'); my $out = Bio::AlignIO->new(-file => ">test.out" , -format => 'fasta'); while ( my $aln = $in->next_aln() ) { my $red_aln = $aln->uniq_seq; $out->write_aln($red_aln); } ------------- If you run: ./test.pl &> log you will get nothing written into the log file... (but the test.out is OK) Am I missing something? By the way, wouldn't it be more convenient to have the ST composition returned in an array? Thanks, --Tristan (BioPerl 1.6) From maj at fortinbras.us Wed Apr 1 23:28:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 23:28:23 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <200904012311.51764.tristan.lefebure@gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com> Message-ID: <29E09DCE622643848EAFA8F1C6210711@NewLife> Tristan-- Strange: it looks like the prints to stderr have been commented out in the source (back in revision 10242; 1.6 is rev 15582). The two statements are easy to find in the SimpleAlign.pm uniq_seq() source; you can uncomment them to work around this. You are right, this is rather an unconventional way to specify an output option-- can Chris comment? Mark ----- Original Message ----- From: "Tristan Lefebure" To: "BioPerl List" Sent: Wednesday, April 01, 2009 11:11 PM Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Hi there, > > I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. > Here is the description: > > Title : uniq_seq > Usage : $aln->uniq_seq(): Remove identical sequences in > in the alignment. Ambiguous base ("N", "n") and > leading and ending gaps ("-") are NOT counted as > differences. > Function : Make a new alignment of unique sequence types (STs) > Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") > 2. ST of each sequence in STDERR > Argument : None > > What I'm trying to obtain is the ST composition (i.e. what is supposed to go > to STDERR), but I see nothing... > > An example: > > --------test.fasta: >>seq1 > AAATTTC >>seq2 > CAATTTC >>seq3 > AAATTTC > ------- > > > ----------test.pl: > #! /usr/bin/perl > > use strict; > use warnings; > use Bio::AlignIO; > use Bio::SimpleAlign; > use Getopt::Long; > > my $in = Bio::AlignIO->new(-file => 'test.fasta' , > -format => 'fasta'); > > my $out = Bio::AlignIO->new(-file => ">test.out" , > -format => 'fasta'); > > while ( my $aln = $in->next_aln() ) { > my $red_aln = $aln->uniq_seq; > $out->write_aln($red_aln); > } > ------------- > > If you run: > > ./test.pl &> log > > you will get nothing written into the log file... (but the test.out is OK) > > Am I missing something? > By the way, wouldn't it be more convenient to have the ST composition returned > in an array? > > Thanks, > > --Tristan > (BioPerl 1.6) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From weigangq at gmail.com Wed Apr 1 23:57:16 2009 From: weigangq at gmail.com (Weigang Qiu) Date: Wed, 1 Apr 2009 22:57:16 -0500 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <29E09DCE622643848EAFA8F1C6210711@NewLife> References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> Message-ID: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Mark and Tristan, I am the original instigator of the uniq_seq method. The STDERR implementation was used so that STDOUT could be piped. But it did not conform to bioperl convention of using the $self->debug() method. I think that's why these lines were commented out and re-implemented using the $self->debug method. So, turning on the debug option should give the intended ST mapping for each sequence in stderr. weigang On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > Tristan-- > Strange: it looks like the prints to stderr have been commented out in the > source (back in revision 10242; 1.6 is rev 15582). The > two statements are easy to find in the SimpleAlign.pm uniq_seq() source; > you can > uncomment them to work around this. > You are right, this is rather an unconventional way to specify an output > option-- can Chris comment? > Mark > ----- Original Message ----- From: "Tristan Lefebure" < > tristan.lefebure at gmail.com> > To: "BioPerl List" > Sent: Wednesday, April 01, 2009 11:11 PM > Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Hi there, >> >> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >> Here is the description: >> >> Title : uniq_seq >> Usage : $aln->uniq_seq(): Remove identical sequences in >> in the alignment. Ambiguous base ("N", "n") and >> leading and ending gaps ("-") are NOT counted as >> differences. >> Function : Make a new alignment of unique sequence types (STs) >> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >> "ST") >> 2. ST of each sequence in STDERR >> Argument : None >> >> What I'm trying to obtain is the ST composition (i.e. what is supposed to >> go >> to STDERR), but I see nothing... >> >> An example: >> >> --------test.fasta: >> >>> seq1 >>> >> AAATTTC >> >>> seq2 >>> >> CAATTTC >> >>> seq3 >>> >> AAATTTC >> ------- >> >> >> ----------test.pl: >> #! /usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::AlignIO; >> use Bio::SimpleAlign; >> use Getopt::Long; >> >> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >> -format => 'fasta'); >> >> my $out = Bio::AlignIO->new(-file => ">test.out" , >> -format => 'fasta'); >> >> while ( my $aln = $in->next_aln() ) { >> my $red_aln = $aln->uniq_seq; >> $out->write_aln($red_aln); >> } >> ------------- >> >> If you run: >> >> ./test.pl &> log >> >> you will get nothing written into the log file... (but the test.out is OK) >> >> Am I missing something? >> By the way, wouldn't it be more convenient to have the ST composition >> returned >> in an array? >> >> Thanks, >> >> --Tristan >> (BioPerl 1.6) >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Weigang Qiu Department of Biological Sciences Hunter College, City University of New York 695 Park Avenue New York, NY 10065 From maj at fortinbras.us Thu Apr 2 00:15:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 00:15:06 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks Weigang-- I didn't look carefully enough-- I'll make a change to the POD. so Tristan, in your code below, add $aln->verbose(1); before you invoke uniq_seq(). The ST's should then be sent to stderr (as "warns"). MAJ ----- Original Message ----- From: "Weigang Qiu" To: "Mark A. Jensen" Cc: "BioPerl List" ; Sent: Wednesday, April 01, 2009 11:57 PM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Mark and Tristan, > > I am the original instigator of the uniq_seq method. The STDERR > implementation was used so that STDOUT could be piped. But it did not > conform to bioperl convention of using the $self->debug() method. I think > that's why these lines were commented out and re-implemented using the > $self->debug method. So, turning on the debug option should give the > intended ST mapping for each sequence in stderr. > > weigang > > On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > >> Tristan-- >> Strange: it looks like the prints to stderr have been commented out in the >> source (back in revision 10242; 1.6 is rev 15582). The >> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >> you can >> uncomment them to work around this. >> You are right, this is rather an unconventional way to specify an output >> option-- can Chris comment? >> Mark >> ----- Original Message ----- From: "Tristan Lefebure" < >> tristan.lefebure at gmail.com> >> To: "BioPerl List" >> Sent: Wednesday, April 01, 2009 11:11 PM >> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Hi there, >>> >>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >>> Here is the description: >>> >>> Title : uniq_seq >>> Usage : $aln->uniq_seq(): Remove identical sequences in >>> in the alignment. Ambiguous base ("N", "n") and >>> leading and ending gaps ("-") are NOT counted as >>> differences. >>> Function : Make a new alignment of unique sequence types (STs) >>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>> "ST") >>> 2. ST of each sequence in STDERR >>> Argument : None >>> >>> What I'm trying to obtain is the ST composition (i.e. what is supposed to >>> go >>> to STDERR), but I see nothing... >>> >>> An example: >>> >>> --------test.fasta: >>> >>>> seq1 >>>> >>> AAATTTC >>> >>>> seq2 >>>> >>> CAATTTC >>> >>>> seq3 >>>> >>> AAATTTC >>> ------- >>> >>> >>> ----------test.pl: >>> #! /usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use Bio::AlignIO; >>> use Bio::SimpleAlign; >>> use Getopt::Long; >>> >>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>> -format => 'fasta'); >>> >>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>> -format => 'fasta'); >>> >>> while ( my $aln = $in->next_aln() ) { >>> my $red_aln = $aln->uniq_seq; >>> $out->write_aln($red_aln); >>> } >>> ------------- >>> >>> If you run: >>> >>> ./test.pl &> log >>> >>> you will get nothing written into the log file... (but the test.out is OK) >>> >>> Am I missing something? >>> By the way, wouldn't it be more convenient to have the ST composition >>> returned >>> in an array? >>> >>> Thanks, >>> >>> --Tristan >>> (BioPerl 1.6) >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Weigang Qiu > Department of Biological Sciences > Hunter College, City University of New York > 695 Park Avenue > New York, NY 10065 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Thu Apr 2 04:17:02 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 02 Apr 2009 10:17:02 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <49D4747E.4060001@uv.es> You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB->get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = taxDB->get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB->get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB->get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB->get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB->get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin You may consult the documentation of the module for a full description. A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each GI in your blast result: my $superkingdom = $taxDB->get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem while using it don't hesitate to contact me. Hope this helps, M; Florent Angly wrote: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from >> ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should be >> fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the nodes.dmp >> file contained within lists what division each tax_id belongs to >> (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can >> probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E >> russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 >> 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to parse >>> a blast >>> report (refseq blast) and check how may organisms are eukaryotes and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Apr 2 08:29:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 08:29:47 -0400 Subject: [Bioperl-l] FYI: note on wiki template behavior Message-ID: <62B28D02BEA44E13BBDB5531FF6D67CF@NewLife> Wiki-interested folks- I fixed a "feature" in the HOWTO template-- When the template was used twice in the same line of text, the text following the first instance was rendered as a "code box". This had to do with how the template itself was formatted. If you're interested, please have a look at http://www.bioperl.org/wiki/Template_talk:HOWTO cheers, Mark From tristan.lefebure at gmail.com Thu Apr 2 09:30:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 2 Apr 2009 09:30:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks you both, To internally store the ST composition, so that I can reuse it in the same script, I made the following modifications to SimpleAlign.pm: diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm 590a591,592 > #modified to also returned an array of the ST composition > my %st; 651a654 > push @{$st{$order{$str}}}, $_->id(); 655c658 < return $aln; --- > return ($aln, %st); This is probably not really BioPerl compliant. Being an OBO ignorant, I wonder if we could add this information somewhere either once in the $aln object, or by little pieces in each Bio::LocatableSeq objects? Thks, --Tristan On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > Thanks Weigang-- I didn't look carefully enough-- > I'll make a change to the POD. > so Tristan, in your code below, add > > $aln->verbose(1); > > before you invoke uniq_seq(). The ST's should > then be sent to stderr (as "warns"). > > MAJ > ----- Original Message ----- From: "Weigang Qiu" > To: "Mark A. Jensen" > Cc: "BioPerl List" ; < > tristan.lefebure at gmail.com> > Sent: Wednesday, April 01, 2009 11:57 PM > Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Mark and Tristan, >> >> I am the original instigator of the uniq_seq method. The STDERR >> implementation was used so that STDOUT could be piped. But it did not >> conform to bioperl convention of using the $self->debug() method. I think >> that's why these lines were commented out and re-implemented using the >> $self->debug method. So, turning on the debug option should give the >> intended ST mapping for each sequence in stderr. >> >> weigang >> >> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >> wrote: >> >> Tristan-- >>> Strange: it looks like the prints to stderr have been commented out in >>> the >>> source (back in revision 10242; 1.6 is rev 15582). The >>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>> you can >>> uncomment them to work around this. >>> You are right, this is rather an unconventional way to specify an output >>> option-- can Chris comment? >>> Mark >>> ----- Original Message ----- From: "Tristan Lefebure" < >>> tristan.lefebure at gmail.com> >>> To: "BioPerl List" >>> Sent: Wednesday, April 01, 2009 11:11 PM >>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>> >>> >>> >>> Hi there, >>> >>>> >>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>> module. >>>> Here is the description: >>>> >>>> Title : uniq_seq >>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>> in the alignment. Ambiguous base ("N", "n") and >>>> leading and ending gaps ("-") are NOT counted as >>>> differences. >>>> Function : Make a new alignment of unique sequence types (STs) >>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>> "ST") >>>> 2. ST of each sequence in STDERR >>>> Argument : None >>>> >>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>> to >>>> go >>>> to STDERR), but I see nothing... >>>> >>>> An example: >>>> >>>> --------test.fasta: >>>> >>>> seq1 >>>>> >>>>> AAATTTC >>>> >>>> seq2 >>>>> >>>>> CAATTTC >>>> >>>> seq3 >>>>> >>>>> AAATTTC >>>> ------- >>>> >>>> >>>> ----------test.pl: >>>> #! /usr/bin/perl >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::AlignIO; >>>> use Bio::SimpleAlign; >>>> use Getopt::Long; >>>> >>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>> -format => 'fasta'); >>>> >>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>> -format => 'fasta'); >>>> >>>> while ( my $aln = $in->next_aln() ) { >>>> my $red_aln = $aln->uniq_seq; >>>> $out->write_aln($red_aln); >>>> } >>>> ------------- >>>> >>>> If you run: >>>> >>>> ./test.pl &> log >>>> >>>> you will get nothing written into the log file... (but the test.out is >>>> OK) >>>> >>>> Am I missing something? >>>> By the way, wouldn't it be more convenient to have the ST composition >>>> returned >>>> in an array? >>>> >>>> Thanks, >>>> >>>> --Tristan >>>> (BioPerl 1.6) >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Weigang Qiu >> Department of Biological Sciences >> Hunter College, City University of New York >> 695 Park Avenue >> New York, NY 10065 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> From dereje1227 at yahoo.com Thu Apr 2 09:45:08 2009 From: dereje1227 at yahoo.com (demis001) Date: Thu, 2 Apr 2009 06:45:08 -0700 (PDT) Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 Message-ID: <22816585.post@talk.nabble.com> Hi , I am new to BioPerl and this forum and even do not know how to post the new post. I have one question for you guys. Is there any BioPerl module that allows me to download sequence based on chromosome name, seqStart and SeqEnd given the formatted human genome database downloaded on my Linux desktop? I used to do this using Perl $URI object and it is really slow as the process depend on the network. To be more specific, I took chrName, seqStart and seqEnd and go to Ensembl database to get the sequence one by one using Perl $URI object. I thought it might be easier if I process locally using indexed database using BioPerl module if there is any designed for this purpose. Input, millions rows of tab delimited (CSV) file contain information about chrName, seqStart, seqEnd. Locally formatted/indexed human genome. Output should be the fasta sequence contain the sequence and with the header contain chr name and location persed Sorry if I posted in the wrong section of the forum and happy to get any recommendation. Thanks Govind Chandra wrote: > > Hi, > > The code below > > > ====== code begins ======= > #use strict; > use Bio::SeqIO; > > $infile='NC_000913.gbk'; > my $seqio=Bio::SeqIO->new(-file => $infile); > my $seqobj=$seqio->next_seq(); > my @features=$seqobj->all_SeqFeatures(); > my $count=0; > foreach my $feature (@features) { > unless($feature->primary_tag() eq 'CDS') {next;} > print($feature->start()," ", $feature->end(), " > ",$feature->strand(),"\n"); > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); > @temp2=$ac->get_Annotations(); > print("$temp1 $temp2[0] @temp2\n"); > if($count++ > 5) {last;} > } > > print(ref($ac),"\n"); > exit; > > ======= code ends ======== > > produces the output > > ========== output begins ======== > > 190 255 1 > 0 > 337 2799 1 > 0 > 2801 3733 1 > 0 > 3734 5020 1 > 0 > 5234 5530 1 > 0 > 5683 6459 -1 > 0 > 6529 7959 -1 > 0 > Bio::Annotation::Collection > > =========== output ends ========== > > $ac is-a Bio::Annotation::Collection but does not actually contain any > annotation from the feature. Is this how it should be? I cannot figure > out what is wrong with the script. Earlier I used to use has_tag(), > get_tag_values() etc. but the documentation says these are deprecated. > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > -a is > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Thanks in advance for any help. > > Govind > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Thu Apr 2 09:46:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 09:46:36 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife><7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Hi Tristan-- I think this is a good thought, Can you register this as an enhancement at http://bugzilla.bioperl.org ? Please go ahead and attach the diff as a patch to the 'bug' report-- thanks for *your* input- cheers, Mark ----- Original Message ----- From: "Tristan Lefebure" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Weigang Qiu" Sent: Thursday, April 02, 2009 9:30 AM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Thanks you both, > > To internally store the ST composition, so that I can reuse it in the same > script, I made the following modifications to SimpleAlign.pm: > > diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm > /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm > 590a591,592 >> #modified to also returned an array of the ST composition >> my %st; > 651a654 >> push @{$st{$order{$str}}}, $_->id(); > 655c658 > < return $aln; > --- >> return ($aln, %st); > > This is probably not really BioPerl compliant. Being an OBO ignorant, I > wonder if we could add this information somewhere either once in the $aln > object, or by little pieces in each Bio::LocatableSeq objects? > > Thks, > > --Tristan > > On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > >> Thanks Weigang-- I didn't look carefully enough-- >> I'll make a change to the POD. >> so Tristan, in your code below, add >> >> $aln->verbose(1); >> >> before you invoke uniq_seq(). The ST's should >> then be sent to stderr (as "warns"). >> >> MAJ >> ----- Original Message ----- From: "Weigang Qiu" >> To: "Mark A. Jensen" >> Cc: "BioPerl List" ; < >> tristan.lefebure at gmail.com> >> Sent: Wednesday, April 01, 2009 11:57 PM >> Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Mark and Tristan, >>> >>> I am the original instigator of the uniq_seq method. The STDERR >>> implementation was used so that STDOUT could be piped. But it did not >>> conform to bioperl convention of using the $self->debug() method. I think >>> that's why these lines were commented out and re-implemented using the >>> $self->debug method. So, turning on the debug option should give the >>> intended ST mapping for each sequence in stderr. >>> >>> weigang >>> >>> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >>> wrote: >>> >>> Tristan-- >>>> Strange: it looks like the prints to stderr have been commented out in >>>> the >>>> source (back in revision 10242; 1.6 is rev 15582). The >>>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>>> you can >>>> uncomment them to work around this. >>>> You are right, this is rather an unconventional way to specify an output >>>> option-- can Chris comment? >>>> Mark >>>> ----- Original Message ----- From: "Tristan Lefebure" < >>>> tristan.lefebure at gmail.com> >>>> To: "BioPerl List" >>>> Sent: Wednesday, April 01, 2009 11:11 PM >>>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>>> >>>> >>>> >>>> Hi there, >>>> >>>>> >>>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>>> module. >>>>> Here is the description: >>>>> >>>>> Title : uniq_seq >>>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>>> in the alignment. Ambiguous base ("N", "n") and >>>>> leading and ending gaps ("-") are NOT counted as >>>>> differences. >>>>> Function : Make a new alignment of unique sequence types (STs) >>>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>>> "ST") >>>>> 2. ST of each sequence in STDERR >>>>> Argument : None >>>>> >>>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>>> to >>>>> go >>>>> to STDERR), but I see nothing... >>>>> >>>>> An example: >>>>> >>>>> --------test.fasta: >>>>> >>>>> seq1 >>>>>> >>>>>> AAATTTC >>>>> >>>>> seq2 >>>>>> >>>>>> CAATTTC >>>>> >>>>> seq3 >>>>>> >>>>>> AAATTTC >>>>> ------- >>>>> >>>>> >>>>> ----------test.pl: >>>>> #! /usr/bin/perl >>>>> >>>>> use strict; >>>>> use warnings; >>>>> use Bio::AlignIO; >>>>> use Bio::SimpleAlign; >>>>> use Getopt::Long; >>>>> >>>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>>> -format => 'fasta'); >>>>> >>>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>>> -format => 'fasta'); >>>>> >>>>> while ( my $aln = $in->next_aln() ) { >>>>> my $red_aln = $aln->uniq_seq; >>>>> $out->write_aln($red_aln); >>>>> } >>>>> ------------- >>>>> >>>>> If you run: >>>>> >>>>> ./test.pl &> log >>>>> >>>>> you will get nothing written into the log file... (but the test.out is >>>>> OK) >>>>> >>>>> Am I missing something? >>>>> By the way, wouldn't it be more convenient to have the ST composition >>>>> returned >>>>> in an array? >>>>> >>>>> Thanks, >>>>> >>>>> --Tristan >>>>> (BioPerl 1.6) >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> -- >>> Weigang Qiu >>> Department of Biological Sciences >>> Hunter College, City University of New York >>> 695 Park Avenue >>> New York, NY 10065 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Wed Apr 1 08:00:59 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 01 Apr 2009 13:00:59 +0100 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D3577B.1090409@sendu.me.uk> Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created > custom fasta headers for your blast database. The easiest way to get > the tax_id for your accessions would be to download the gi->tax_id > list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the > blast hits then lookup the tax_id from that hash, I think it should > be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on > tax_id is a separate problem :-) If you grab the taxdump.tar.gz file > from the same site, the nodes.dmp file contained within lists what > division each tax_id belongs to (Bacteria, Invertebrates, Mammals, > Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. From shalabh.sharma7 at gmail.com Thu Apr 2 15:50:58 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 2 Apr 2009 15:50:58 -0400 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D3577B.1090409@sendu.me.uk> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> Message-ID: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala wrote: > Smithies, Russell wrote: > >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. The easiest way to get >> the tax_id for your accessions would be to download the gi->tax_id >> list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If >> you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) If you grab the taxdump.tar.gz file >> from the same site, the nodes.dmp file contained within lists what >> division each tax_id belongs to (Bacteria, Invertebrates, Mammals, >> Phages, Plants, etc) so you can probably work it out from that. >> > > Check out the synopsis for Bio::Taxon > http://doc.bioperl.org/bioperl-live/Bio/Taxon.html > > If the division() function doesn't tell you what you need, you could use > get_lineage_nodes() and check the oldest ancestors to see if its a pro > or euk. > From Russell.Smithies at agresearch.co.nz Thu Apr 2 15:55:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 08:55:06 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> We're here to help - unless it's to do your homework ;-) --Russell From: shalabh sharma [mailto:shalabh.sharma7 at gmail.com] Sent: Friday, 3 April 2009 8:51 a.m. To: Sendu Bala Cc: Smithies, Russell; bioperl-l Subject: Re: [Bioperl-l] taxonomy ID thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala > wrote: Smithies, Russell wrote: The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Thu Apr 2 20:46:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 13:46:39 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> I'm re-formatting some blast output into nice html webpages but am finding $self->end_report() and $self->footer() don't seem to be working. The other methods ($self->start_report, $self->introduction, $self->title) all work fine. Am I doing something wrong or is there a trick to it? Here's some test code: ================================== #!perl -w use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use CGI qw(:standard); my $in = Bio::SearchIO->new(-format => "blast",-file => shift @ARGV, ); my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); $index->start_report( \&my_start_report ); $index->title( \&my_title ); $index->footer(\&my_footer); $index->end_report(\&my_end_report); my $out = Bio::SearchIO->new(-writer => $index, -file => ">blast.htm"); $out->write_result($in->next_result); sub my_start_report{ return h1('this is my header'); } sub my_title{ return h1('this is my title'); } sub my_footer{ my ($self) = @_; return h2('this is a footer'); } sub my_end_report { return h2('this is the end'); } ================================= Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Thu Apr 2 21:09:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 2 Apr 2009 18:09:20 -0700 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> Message-ID: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> looking at the code - it doesn't seem to accept resetting the default value. sub end_report { return "\n\n"; } sub footer { my ($self) = @_; return "
Produced by Bioperl module ".ref($self)." on $DATE
Revision: $Revision
\n" } So just adjusting it to mirror what is happening for title and the rest would be necessary. -jason On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > I'm re-formatting some blast output into nice html webpages but am > finding $self->end_report() and $self->footer() don't seem to be > working. > The other methods ($self->start_report, $self->introduction, $self- > >title) all work fine. > Am I doing something wrong or is there a trick to it? > > Here's some test code: > ================================== > > #!perl -w > > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > use CGI qw(:standard); > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > @ARGV, ); > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > $index->start_report( \&my_start_report ); > $index->title( \&my_title ); > $index->footer(\&my_footer); > $index->end_report(\&my_end_report); > > my $out = Bio::SearchIO->new(-writer => $index, -file => > ">blast.htm"); > > $out->write_result($in->next_result); > > > sub my_start_report{ > return h1('this is my header'); > } > > sub my_title{ > return h1('this is my title'); > } > > sub my_footer{ > my ($self) = @_; > return h2('this is a footer'); > } > > sub my_end_report { > return h2('this is the end'); > } > > ================================= > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Thu Apr 2 22:16:34 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 15:16:34 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEE2E@exchsth.agresearch.co.nz> Not wanting to be picky... But $result_>database_name (for blast results) returns the description of the database rather than just the name. Eg. "hs.fna (Human mRNA Refseqs)" instead of "hs.fna" I've had a hunt but can't see where the code for getting the database_name is. Any ideas? Thanx, --Russell > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > Stajich > Sent: Friday, 3 April 2009 2:09 p.m. > To: Smithies, Russell > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? > > looking at the code - it doesn't seem to accept resetting the default > value. > sub end_report { > return "\n\n"; > } > > sub footer { > my ($self) = @_; > return "
Produced by Bioperl module ".ref($self)." on > $DATE
Revision: $Revision
\n" > > } > > So just adjusting it to mirror what is happening for title and the > rest would be necessary. > > -jason > On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > > > I'm re-formatting some blast output into nice html webpages but am > > finding $self->end_report() and $self->footer() don't seem to be > > working. > > The other methods ($self->start_report, $self->introduction, $self- > > >title) all work fine. > > Am I doing something wrong or is there a trick to it? > > > > Here's some test code: > > ================================== > > > > #!perl -w > > > > use Bio::SearchIO; > > use Bio::SearchIO::Writer::HTMLResultWriter; > > use CGI qw(:standard); > > > > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > > @ARGV, ); > > > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > > > $index->start_report( \&my_start_report ); > > $index->title( \&my_title ); > > $index->footer(\&my_footer); > > $index->end_report(\&my_end_report); > > > > my $out = Bio::SearchIO->new(-writer => $index, -file => > > ">blast.htm"); > > > > $out->write_result($in->next_result); > > > > > > sub my_start_report{ > > return h1('this is my header'); > > } > > > > sub my_title{ > > return h1('this is my title'); > > } > > > > sub my_footer{ > > my ($self) = @_; > > return h2('this is a footer'); > > } > > > > sub my_end_report { > > return h2('this is the end'); > > } > > > > ================================= > > > > Thanx, > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > From bernd.web at gmail.com Fri Apr 3 09:47:23 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 15:47:23 +0200 Subject: [Bioperl-l] AlignIO clustal Message-ID: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Hi, Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a clustalw alignment. At the moment, I cannot update to a newer version, so am not sure this problem still exists. The problem is that the $aln object does not exists when the last sequence in a block contains gaps only. Anybody has seen this or knows a fix? Code and example input follows below. Regards, Bernd use Bio::AlignIO; my $in = Bio::AlignIO->new(-file => 'test.aln', -format => 'clustalw'); my $out = Bio::AlignIO->new(-file => '>testerr.ALN', -format => 'clustalw'); my $aln = $in->next_aln(); print $aln->length, "\n"; test.aln contains: CLUSTAL W(1.81) multiple sequence alignment QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI 7082547/1-128 ---------ERATNDMLIGP--DWAVN 1_3265048/1-0 --------------------------- 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV 1_3265047/1-0 --------------------------- From bernd.web at gmail.com Fri Apr 3 10:11:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 16:11:44 +0200 Subject: [Bioperl-l] AlignIO clustal In-Reply-To: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> References: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Message-ID: <716af09c0904030711l8252943hff489ccb9f720920@mail.gmail.com> Hi, I noticed this issue is not specific to Clustal; it also occurs for Fasta. The "problem" arises in a last check, which is only done on the last sequence; it is still present in the current code (webcvs) in the next_aln code. In fasta.pm: # If $end <= 0, we have either reached the end of # file in <> or we have encountered some other error if ( $end <= 0 ) { undef $aln; return $aln; } In clustalw.pm # not sure if this should be a default option - or we can pass in # an option to do this in the future? --jason stajich # $aln->map_chars('\.','-'); undef $aln if ( !defined $end || $end <= 0 ); return $aln; And the last sequence actually got a zero end. This was given in an $aln->slice where gap only sequences are retained. It will also get a "0" in next_aln itself if no coordinates would be present. 1_3265047/1-0 --------------------------- For now, uncommenting "undef $aln if ( !defined $end || $end <= 0 );" works. Regards, Bernd On Fri, Apr 3, 2009 at 3:47 PM, Bernd Web wrote: > Hi, > > Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a > clustalw alignment. > At the moment, I cannot update to a newer version, so am not sure this > problem still exists. > > The problem is that the $aln object does not exists when the last > sequence in a block contains gaps only. > Anybody has seen this or knows a fix? Code and example input follows below. > > > Regards, > Bernd > > > use Bio::AlignIO; > my $in = Bio::AlignIO->new(-file => 'test.aln', > -format => 'clustalw'); > > my $out = Bio::AlignIO->new(-file => '>testerr.ALN', > -format => 'clustalw'); > > my $aln = $in->next_aln(); > print $aln->length, "\n"; > > test.aln contains: > > CLUSTAL W(1.81) multiple sequence alignment > > > QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI > 7082547/1-128 ---------ERATNDMLIGP--DWAVN > 1_3265048/1-0 --------------------------- > 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV > 1_3265047/1-0 --------------------------- > From hlapp at gmx.net Mon Apr 6 11:39:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 6 Apr 2009 11:39:50 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> (Removing biosql-l from the cc list as this seems to be a problem with BioPerl.) Hi Johann, I don't know whether anyone has responded to you yet - if not I'm sorry, I've been inundated for the past couple test. On Apr 1, 2009, at 6:14 AM, Johann PELLET wrote: > With the latest version of BioPerl and BioSQL, I have tried to > insert entry from a GenBank file, which I have downloaded from the > NCBI website (648 937 records) Could you be more specific? When you say the latest version of BioPerl, do you mean 1.6.1 or the current svn snapshot of the main trunk? And which Genbank file is it? Is it one with only viruses, i.e., are you specifically interested in the virus sequences that the parser is giving you trouble with? > After successfully loading ncbi_taxonomy i am getting following > error message while loading sequences into database. > > perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg - > dbname biosql > > > --------------------- WARNING --------------------- > MSG: The supplied lineage does not start near 'Human papillomavirus > type 2c' (I was supplied 'Human papillomavirus - 2 | > Alphapapillomavirus | Papillomaviridae') This is a problem in the BioPerl genbank parser, or more specifically, in the species parser. I thought though this was fixed in 1.6.1; are you sure you don't have an older version of BioPerl lying around that could accidentally have been used? That said, it only seems to be a warning; did you check how the record ended up in the database and found it to be incomplete or messed up? > the script is not stopped until this entry: S67864 This a later entry, not the same entry that causes the problem above, right? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, > values were ("1","19)","1","3") FKs (41914,) > ERROR: invalid input syntax for integer: "19)" Oops - that's a problem that must originate from the BioPerl feature location parser. The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 Does anyone see why the location parser should have a problem with the first gene feature? It's nested, and has remote location components, but at first sight nothing jumps out at me as extraordinary. Has someone recently changed the location parsing code? If no-one has an immediate idea what could be at work here, this needs investigating. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From torsten.seemann at infotech.monash.edu.au Mon Apr 6 21:05:25 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 7 Apr 2009 11:05:25 +1000 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: > The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > Does anyone see why the location parser should have a problem with the first > gene feature? It's nested, and has remote location components, but at first > sight nothing jumps out at me as extraordinary. Has someone recently changed > the location parsing code? If no-one has an immediate idea what could be at > work here, this needs investigating. I'm not sure if Bioperl handles the order() operator? For those unfamilair with the order() operator: http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 order(location,location, ... location) The elements can be found in the specified order (5' to 3' direction), but nothing is implied about the reasonableness about joining them. --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Mon Apr 6 23:59:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Apr 2009 22:59:14 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 > > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > >> Does anyone see why the location parser should have a problem with >> the first >> gene feature? It's nested, and has remote location components, but >> at first >> sight nothing jumps out at me as extraordinary. Has someone >> recently changed >> the location parsing code? If no-one has an immediate idea what >> could be at >> work here, this needs investigating. The location parsing code was refactored above 3-4 years ago w/o problems. This'll be the first one to crop up. I'll try taking a look at it. > I'm not sure if Bioperl handles the order() operator? > > For those unfamilair with the order() operator: > > http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 > > order(location,location, ... location) > The elements can be found in the specified order (5' to 3' direction), > but nothing is implied about the reasonableness about joining them. > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA It's interesting that the version from eutils differs significantly in the feature table when retrieving 'gb' or 'gbwithparts', the latter resolves the location (see below). Regardless we'll need to make sure this is parseable. .... FEATURES Location/Qualifiers source 1..77 /organism="Ovine respiratory syncytial virus" /mol_type="genomic RNA" /db_xref="taxon:28869" gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) /gene="G" gene 55..>77 /gene="fusion glycoprotein F" chris From cjfields at illinois.edu Tue Apr 7 01:32:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Apr 2009 00:32:52 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> Message-ID: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Fixed in svn now and have added this as a test case (passes all tests in bioperl-live). For some reason this wasn't catching some more complex combinations of operators, mainly those with mixes of order/ join. chris On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: > >>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >> >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> >>> Does anyone see why the location parser should have a problem with >>> the first >>> gene feature? It's nested, and has remote location components, but >>> at first >>> sight nothing jumps out at me as extraordinary. Has someone >>> recently changed >>> the location parsing code? If no-one has an immediate idea what >>> could be at >>> work here, this needs investigating. > > The location parsing code was refactored above 3-4 years ago w/o > problems. This'll be the first one to crop up. I'll try taking a > look at it. > >> I'm not sure if Bioperl handles the order() operator? >> >> For those unfamilair with the order() operator: >> >> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >> >> order(location,location, ... location) >> The elements can be found in the specified order (5' to 3' >> direction), >> but nothing is implied about the reasonableness about joining them. >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA > > It's interesting that the version from eutils differs significantly > in the feature table when retrieving 'gb' or 'gbwithparts', the > latter resolves the location (see below). Regardless we'll need to > make sure this is parseable. > > .... > > FEATURES Location/Qualifiers > source 1..77 > /organism="Ovine respiratory syncytial virus" > /mol_type="genomic RNA" > /db_xref="taxon:28869" > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > /gene="G" > gene 55..>77 > /gene="fusion glycoprotein F" > > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johann.pellet at inserm.fr Tue Apr 7 04:48:56 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Tue, 7 Apr 2009 10:48:56 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <73508372-0C43-4693-8135-45C128A25959@inserm.fr> Thanks all, I will update bioperl-live using svn right now, and I will restart to load sequences into my biosql database. Hilmar, My GenBank file contains only virus sequences. I downloaded it using eutils, (db=nuccore, tool=ebot, rettype=gb ...). Thank you again -- -- Johann Pellet Le 7 avr. 09 ? 07:32, Chris Fields a ?crit : > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Apr 7 13:56:27 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 7 Apr 2009 13:56:27 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Awesome, thanks Chris! $beer_owed++; -hilmar On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From juheymann at yahoo.com Tue Apr 7 14:20:04 2009 From: juheymann at yahoo.com (Jurgen Heymann) Date: Tue, 7 Apr 2009 11:20:04 -0700 (PDT) Subject: [Bioperl-l] restriction site map Message-ID: <237420.97841.qm@web54203.mail.re2.yahoo.com> Hi All: I would like to convert a table (restriction enzyme / position where it cuts in gene of interest) into a graphical representation. What avenues exists for that? Would appreciate your comments. Thank you, Jurgen From wenzhiwang1983 at yahoo.com.cn Tue Apr 7 21:39:59 2009 From: wenzhiwang1983 at yahoo.com.cn (Wen-Zhi WANG) Date: Wed, 8 Apr 2009 09:39:59 +0800 (CST) Subject: [Bioperl-l] Pasing Affymatrix Microarray output Message-ID: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Dear all, ? Recently, I focus on population genomics data outputed by affymatrix microarray system. However, softwares which designed by affy. inc only run in Windows 386 platform. Is there any application can used in Linux? Bio::Affymatrix was not strong enough to get the detailed informaton. ? Thank you a lot. ? Yours, WWZ ___________________________________________________________________ ? Wen-Zhi WANG State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel:??????(86) 871-5198993 Fax:???? (86) 871-5195430 Mobile: 13759114244 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From Russell.Smithies at agresearch.co.nz Tue Apr 7 21:58:54 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 8 Apr 2009 13:58:54 +1200 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABF94C@exchsth.agresearch.co.nz> Have you had a look at Microarray-GeneXplorer http://search.cpan.org/~sherlock/Microarray-GeneXplorer-0.11/ I haven't used it but I'd expect it to be pretty good being from Gavin Sherlock :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Wen-Zhi WANG > Sent: Wednesday, 8 April 2009 1:40 p.m. > To: BioPerl List > Subject: [Bioperl-l] Pasing Affymatrix Microarray output > > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run in > Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > > Thank you a lot. > > Yours, > WWZ > ___________________________________________________________________ > > Wen-Zhi WANG > > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel:??????(86) 871-5198993 > Fax:???? (86) 871-5195430 > Mobile: 13759114244 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sdavis2 at mail.nih.gov Tue Apr 7 22:10:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 7 Apr 2009 22:10:17 -0400 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run > in Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > You may want to look at a non-bioperl solution such as Bioconductor ( http://bioconductor.org). Sean From sac at bioperl.org Wed Apr 8 01:59:49 2009 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 7 Apr 2009 22:59:49 -0700 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> Message-ID: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Check out our Affymetrix Power Tools (APT) package: http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx We distribute binaries for Linux and Mac OSX, as well as source code so you can compile it yourself if you want. Note however that this is written in C++, not Perl. We don't provide SWIG or XS interfaces for direct access via Perl, though this would definitely be doable, if anyone is interested. Probably the easiest approach from Perl would be to simply call the appropriate APT executable through the shell as in: system("/path/to/apt --args ..."); The Perl code can parse the output files and take it from there. Steve On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis wrote: > On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > >> Dear all, >> >> Recently, I focus on population genomics data outputed by affymatrix >> microarray system. However, softwares which designed by affy. inc only run >> in Windows 386 platform. Is there any application can used in Linux? >> Bio::Affymatrix was not strong enough to get the detailed informaton. >> > > You may want to look at a non-bioperl solution such as Bioconductor ( > http://bioconductor.org). > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Wed Apr 8 10:07:17 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 8 Apr 2009 07:07:17 -0700 (PDT) Subject: [Bioperl-l] Access Uniprot detailed information Message-ID: <22951210.post@talk.nabble.com> Hi there, maybe I am not able to read careful enough through the Howto section. But is there a function in BioPerl that retrieves for a given Uniprot Access Code or ID from the Uniprot Database some general annotations like enzymatic activity or literature references? I appreciate any help! -- View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From johann.pellet at inserm.fr Wed Apr 8 11:29:29 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 8 Apr 2009 17:29:29 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Hie all, I confirm that now it's ok for the LOCUS S67862S3 since Chris update. Thanks again. However I still have Warning message with other entries like: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93MP8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93P8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- ######################################################################################################################### but entries are inserted in the biosql database: ######################################################################################################################### biosql=# select * from bioentry where description like 'Hantaanvirus CGRn93P8%'; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+----------+----------- +------------+---------- + ----------------------------------------------------------------------- +--------- 156282 | 84 | 395824 | EF990932 | EF990932 | 156144486 | VRL | Hantaanvirus CGRn93P8 RNA-dependent RNA polymerase gene, partial cds. | 1 156288 | 84 | 395824 | EF990918 | EF990918 | 154623008 | VRL | Hantaanvirus CGRn93P8 segment M, complete sequence. | 1 156294 | 84 | 395824 | EF990904 | EF990904 | 154622980 | VRL | Hantaanvirus CGRn93P8 segment S, complete sequence. | 1 (3 rows) ######################################################################################################################### and finally EU608407 and EU608559 made a crash: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Fowl adenovirus 8' (I was supplied 'Fowl adenovirus E | Aviadenovirus | Adenoviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- #######...14 times ...############ --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008","","","CRC- D35248959C54B9F2","1","1212","") FKs () ERROR: null value in column "location" violates not-null constraint --------------------------------------------------- Could not store EU608559: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children / Library/Perl/5.8.8/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:230 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:237 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 ######################################################################################################################### If I check in the biosql database if some part of this records are inserted: ######################################################################################################################### select * from reference where title='Evidence for positive epistasis in HIV-1'; reference_id | dbxref_id | location | title | authors | crc --------------+-----------+-------------------------------------- +------------------------------------------ + ----------------------------------------------------------------------------+ ---------------------- 16443 | 4179 | Science 306 (5701), 1547-1550 (2004) | Evidence for positive epistasis in HIV-1 | Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. | CRC-19E7AA4FB7A5D4AF (1 row) select * from dbxref where dbxref_id=4179; dbxref_id | dbname | accession | version -----------+--------+-----------+--------- 4179 | PUBMED | 15567861 | 0 select * from bioentry where accession=15567861; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+------+----------- +------------+----------+-------------+--------- (0 rows) ######################################################################################################################### I don't have records with name='EU608407' or 'EU608559' in the bioentry table. Thanks for your help Johann -- -- Johann Pellet Le 7 avr. 09 ? 19:56, Hilmar Lapp a ?crit : > Awesome, thanks Chris! $beer_owed++; > > -hilmar > > On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > >> Fixed in svn now and have added this as a test case (passes all >> tests in bioperl-live). For some reason this wasn't catching some >> more complex combinations of operators, mainly those with mixes of >> order/join. >> >> chris >> >> On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: >> >>> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >>> >>>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/ >>>>> 544772 >>>> >>>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>>> >>>>> Does anyone see why the location parser should have a problem >>>>> with the first >>>>> gene feature? It's nested, and has remote location components, >>>>> but at first >>>>> sight nothing jumps out at me as extraordinary. Has someone >>>>> recently changed >>>>> the location parsing code? If no-one has an immediate idea what >>>>> could be at >>>>> work here, this needs investigating. >>> >>> The location parsing code was refactored above 3-4 years ago w/o >>> problems. This'll be the first one to crop up. I'll try taking a >>> look at it. >>> >>>> I'm not sure if Bioperl handles the order() operator? >>>> >>>> For those unfamilair with the order() operator: >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>>> >>>> order(location,location, ... location) >>>> The elements can be found in the specified order (5' to 3' >>>> direction), >>>> but nothing is implied about the reasonableness about joining them. >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>> >>> It's interesting that the version from eutils differs >>> significantly in the feature table when retrieving 'gb' or >>> 'gbwithparts', the latter resolves the location (see below). >>> Regardless we'll need to make sure this is parseable. >>> >>> .... >>> >>> FEATURES Location/Qualifiers >>> source 1..77 >>> /organism="Ovine respiratory syncytial virus" >>> /mol_type="genomic RNA" >>> /db_xref="taxon:28869" >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> /gene="G" >>> gene 55..>77 >>> /gene="fusion glycoprotein F" >>> >>> >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cgoddard at flmnh.ufl.edu Wed Apr 8 11:25:37 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 11:25:37 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db Message-ID: <49DCC1F1.6080601@flmnh.ufl.edu> I am running into problems when trying to insert a sequence object retrieved from GenBank into a BioSQL schema running in a Postgres database. Whenever I use the 'create()' method on the sequence that has been made into a persistent object, the sequence isn't saved into the database properly. No error messages are given, and the corresponding Postgres primary key sequences are incremented as if the data had been saved properly: the appropriate tables themselves remain empty though. I am completely new to using the biosql-db modules, and so am probably missing something pretty simple. Below you will see the basic code that causes the problem. my $genbank_id = 'AYXXXXXX' my $genDB = new Bio::DB::GenBank; $sequence = $genDB->get_Seq_by_id($genbank_id); my $db = Bio::DB::BioDB->new(-database => 'biosql', -user => 'username', -dbname => 'dbname', -host => 'localhost', -driver => 'Pg'); my $pobj = $db->create_persistent($sequence); $pobj->create(); I am running the latest svn trunk versions of bioperl and bioperl-db (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI taxonomy info using the script included in the BioSQL package, and that data seemed to install without error. Any help or advice would be greatly appreciated. Thanks, Chris Goddard From hlapp at gmx.net Wed Apr 8 12:21:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:21:11 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCC1F1.6080601@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> Message-ID: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> This all sounds like you aren't issuing commit. Are you sure your code contains $popj->commit() and what you are looking at is *after* that is executed? Bioperl-db is transactional, so you decide when to commit (or rollback). -hilmar On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > I am running into problems when trying to insert a sequence object > retrieved from GenBank into a BioSQL schema running in a Postgres > database. Whenever I use the 'create()' method on the sequence that > has been made into a persistent object, the sequence isn't saved > into the database properly. No error messages are given, and the > corresponding Postgres primary key sequences are incremented as if > the data had been saved properly: the appropriate tables themselves > remain empty though. > > I am completely new to using the biosql-db modules, and so am > probably missing something pretty simple. Below you will see the > basic code that causes the problem. > > my $genbank_id = 'AYXXXXXX' > > my $genDB = new Bio::DB::GenBank; > $sequence = $genDB->get_Seq_by_id($genbank_id); > > my $db = Bio::DB::BioDB->new(-database => 'biosql', > -user => 'username', > -dbname => 'dbname', > -host => 'localhost', > -driver => 'Pg'); > > my $pobj = $db->create_persistent($sequence); > $pobj->create(); > > I am running the latest svn trunk versions of bioperl and bioperl-db > (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI > taxonomy info using the script included in the BioSQL package, and > that data seemed to install without error. Any help or advice would > be greatly appreciated. > > Thanks, > Chris Goddard > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 12:40:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:40:53 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCD120.8020302@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> <49DCD120.8020302@flmnh.ufl.edu> Message-ID: <4A6EA2F3-BA88-474E-A9D9-C1A7444CA755@gmx.net> On Apr 8, 2009, at 12:30 PM, Chris Goddard wrote: > That was it. I guess I just incorrectly assumed that create() did > an auto-commit. That was simple to fix. Thank you! > No problem, I'm glad I could be helpful! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cgoddard at flmnh.ufl.edu Wed Apr 8 12:30:24 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 12:30:24 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> Message-ID: <49DCD120.8020302@flmnh.ufl.edu> That was it. I guess I just incorrectly assumed that create() did an auto-commit. That was simple to fix. Thank you! Chris Hilmar Lapp wrote: > This all sounds like you aren't issuing commit. Are you sure your code > contains $popj->commit() and what you are looking at is *after* that > is executed? > > Bioperl-db is transactional, so you decide when to commit (or rollback). > > -hilmar > > On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > >> I am running into problems when trying to insert a sequence object >> retrieved from GenBank into a BioSQL schema running in a Postgres >> database. Whenever I use the 'create()' method on the sequence that >> has been made into a persistent object, the sequence isn't saved into >> the database properly. No error messages are given, and the >> corresponding Postgres primary key sequences are incremented as if >> the data had been saved properly: the appropriate tables themselves >> remain empty though. >> >> I am completely new to using the biosql-db modules, and so am >> probably missing something pretty simple. Below you will see the >> basic code that causes the problem. >> >> my $genbank_id = 'AYXXXXXX' >> >> my $genDB = new Bio::DB::GenBank; >> $sequence = $genDB->get_Seq_by_id($genbank_id); >> >> my $db = Bio::DB::BioDB->new(-database => 'biosql', >> -user => 'username', >> -dbname => 'dbname', >> -host => 'localhost', >> -driver => 'Pg'); >> >> my $pobj = $db->create_persistent($sequence); >> $pobj->create(); >> >> I am running the latest svn trunk versions of bioperl and bioperl-db >> (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI >> taxonomy info using the script included in the BioSQL package, and >> that data seemed to install without error. Any help or advice would >> be greatly appreciated. >> >> Thanks, >> Chris Goddard >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sanjay.harke at gmail.com Wed Apr 8 23:24:45 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Thu, 9 Apr 2009 08:54:45 +0530 Subject: [Bioperl-l] Help in basics of Bioperl Message-ID: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Dear friend, I need help in following problem.I am beginer in bioperl i have sequence data. i install perl-bioperl on my computer. Now i want analyse sequences with blast, tree and multiple sequence analysis. so kindly guide me from basic. sanjay From abhishek.vit at gmail.com Wed Apr 8 23:31:26 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 8 Apr 2009 23:31:26 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: Dear Sanjay As much as people on this love to help out. I would definitely put in some efforts to atleast go through the basic bio perl tutorial before asking this question. Atleast that would have helped you frame the question legitimately. I think we should put diligent effort before trying to take other people's help. Here is the link to bio perl tutorial please try to go through the relevant sections. I am sure you will get your answer there. http://www.bioperl.org/Core/Latest/bptutorial.html Thanks, -Abhi On Wed, Apr 8, 2009 at 11:24 PM, Sanjay Harke wrote: > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Apr 8 23:35:12 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:35:12 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > [...] > and finally EU608407 and EU608559 made a crash: > > [...] > --------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, > attempting to recover > --------------------------------------------------- > #######...14 times ...############ I would assume that you figured out that this was triggered by or affected EU608407? Would you mind sharing how? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., > Whitcomb,LOCUS EU608407 > 1212 bp DNA linear VRL 20-APR-2008","","","CRC- > D35248959C54B9F2","1","1212","") FKs () > ERROR: null value in column "location" violates not-null constraint Is this really the verbatim copy of the error message you saw on the screen? What's really puzzling about this is how the genbank SeqIO parser could mess up parsing the reference entry to badly. Here's the reference from the version online at NCBI: REFERENCE 1 (bases 1 to 1212) AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. TITLE Evidence for positive epistasis in HIV-1 JOURNAL Science 306 (5701), 1547-1550 (2004) PUBMED 15567861 How the first author line would be chopped off at the end and the LOCUS line would have gotten inserted there is a mystery to me. The location is "Science 306 (5701), 1547-1550 (2004)", and according to the error message the parser failed to extract that and the TITLE. Could you confirm that the file you are parsing is not corrupted in any way, specifically for this record? > --------------------------------------------------- > Could not store EU608559: > ------------- EXCEPTION: Bio::Root::Exception ------------- > [...] > > If I check in the biosql database if some part of this records are > inserted: So are there other sequences associated with that PubMed ID? Can you do a grep on the PubMed ID and see whether it occurs already before the one that trips up the load? > [...] > select * from dbxref where dbxref_id=4179; > dbxref_id | dbname | accession | version > -----------+--------+-----------+--------- > 4179 | PUBMED | 15567861 | 0 > > select * from bioentry where accession=15567861; Note that 15567861 is the accession (PubMed ID) for the referenced article, not the sequence. Which bioentries are associated with a reference would be in the bioentry_reference table. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 23:51:52 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:51:52 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > > On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > >> [...] >> and finally EU608407 and EU608559 made a crash: >> >> [...] >> --------------------- WARNING --------------------- >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to recover >> --------------------------------------------------- >> #######...14 times ...############ > > I would assume that you figured out that this was triggered by or > affected EU608407? Would you mind sharing how? Looking at EU608407, it most likely wasn't the culprit or stumbling stone. It must have been triggered before that. > [...] > So are there other sequences associated with that PubMed ID? To answer my own question, it's indeed EU608407 that's from the same PubMed ID, and so am I correct in assuming that you didn't get the exception for that record, which would mean that the reference was properly inserted when that sequence was loaded. The second occurrence of the same PubMed ID should have actually triggered a successful lookup of the previously inserted record, which would then have skipped the insert. The fact that that didn't happen suggests that the PubMed ID also wasn't properly extracted from the Genbank record. So my first suspicion remains that your file is corrupted. Otherwise, if you download this record: http://www.ncbi.nlm.nih.gov/nuccore/183191257 in GenBank format and try to load it alone, it should yield the same error. Can you indeed reproduce the problem in that way? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Apr 8 23:55:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Apr 2009 23:55:12 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: <4FAA64AA47534B98874AB16622D184BA@NewLife> Hi Sanjay, Judging from your posts to the list this month, I see you have an appreciation of the power of Bioperl to help you get all kinds of analysis jobs done, and that you have a real desire to learn a lot about it. I want to encourage that attitude. I also want to remind you that the absolutely best way to really understand anything is to dive into your project and try to understand the basics *on your own*. Your posts to this are honestly much too general for this list. People here are really generous with their time, but they don't have enough of it to walk you through every step. When I have an issue with my Bioperl programming (and believe me, I have had and do have many), I do at least three things before I consider posting on this list: * I read the documentation for the module I'm working with. * I go to the wiki (www.bioperl.org) and look for HOWTOs or tutorials. There is a search facility there, and many many MANY introductory links. * I go to the source code directly, and try to figure out what it is really doing. So, it turns out I rarely post questions to the list, because I've figured out my dumb mistake, or how to do that new thing. PLUS, I've become that much closer to true Bioperl independence. Please go to the page http://www.bioperl.org/wiki/Getting_Started and *read it*. Please follow the links. You may even find that your work has already been done for you. One hint that works here on the list and elsewhere is: the more work you can show you have done by yourself, the more willing an expert is to help you over the hard parts. Conversely, the less work you do, the greater the chance that your questions will go unheard. cheers, Mark ----- Original Message ----- From: "Sanjay Harke" To: Sent: Wednesday, April 08, 2009 11:24 PM Subject: [Bioperl-l] Help in basics of Bioperl > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johann.pellet at inserm.fr Thu Apr 9 05:48:43 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Thu, 9 Apr 2009 11:48:43 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> Message-ID: <2FDD67FF-5DBA-4987-A04D-231AF8B1E93B@inserm.fr> Hie Hilmar, I am very sorry, I checked my GenBank file, and you are right It's corrupted :-( grep EU608407 genbankFile AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008 ACCESSION EU608407 VERSION EU608407.1 GI:183190953 So I have downloaded EU608407 and I have loaded it alone with load_sequence.pl without problems. Same for EU608559. Thanks again Johann Le 9 avr. 09 ? 05:51, Hilmar Lapp a ?crit : > > On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > >> >> On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: >> >>> [...] >>> and finally EU608407 and EU608559 made a crash: >>> >>> [...] >>> --------------------- WARNING --------------------- >>> MSG: Unexpected error in feature table for Skipping feature, >>> attempting to recover >>> --------------------------------------------------- >>> #######...14 times ...############ >> >> I would assume that you figured out that this was triggered by or >> affected EU608407? Would you mind sharing how? > > Looking at EU608407, it most likely wasn't the culprit or stumbling > stone. It must have been triggered before that. > >> [...] >> So are there other sequences associated with that PubMed ID? > > To answer my own question, it's indeed EU608407 that's from the same > PubMed ID, and so am I correct in assuming that you didn't get the > exception for that record, which would mean that the reference was > properly inserted when that sequence was loaded. > > The second occurrence of the same PubMed ID should have actually > triggered a successful lookup of the previously inserted record, > which would then have skipped the insert. The fact that that didn't > happen suggests that the PubMed ID also wasn't properly extracted > from the Genbank record. So my first suspicion remains that your > file is corrupted. > > Otherwise, if you download this record: > http://www.ncbi.nlm.nih.gov/nuccore/183191257 > > in GenBank format and try to load it alone, it should yield the same > error. Can you indeed reproduce the problem in that way? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From montalen at moulon.inra.fr Thu Apr 9 06:49:22 2009 From: montalen at moulon.inra.fr (montalent) Date: Thu, 9 Apr 2009 12:49:22 +0200 Subject: [Bioperl-l] Bioperl add_object_condition Message-ID: <6D76CE64E5E744C7B571F3BA31670F9D@bioinfo2> Dear colleague, I try to use add_object_condition() function, to get a subset of sequences. I try this : # 1. STORE SELECTED BAC IN AN HASH TABLE : key = bac_name, value = sequence # 1.1 STORE SELECTED BAC NAME IN AN ARRAY my @selected_bac_list=(); open (SELECTION, $bac_selection_file) or die "can not open $bac_selection_file :$!\n"; while (my $line=){ my ($bac_name)=($line =~ /^(.+?);.+/); # print $bac_name."\n"; push @selected_bac_list, $bac_name; } # 1.2 READ FASTA FILE WITH BIOPERL TO STORE IN AN HASH TABLE my $bac_fasta= Bio::SeqIO->new(-file=>$maize_sequence_bac_file, '-format'=>"Fasta"); my $builder = $bac_fasta->sequence_builder(); if ($builder->add_object_condition(sub { print " check \n"; my $seq_ref=shift; if ($ref_seq->{'-length'} > 5000;){ return 0;} else {return 1;} })){ print "add_object_condition renvoie true\n";} else{ print "add_object_condition renvoie false\n";} # for each sequence in fasta file, check if it is a selected bac while(my $seq=$bac_fasta->next_seq()){ print $seq->id."\n"; # PB : IT PRINTS ALL THE SEQUENCE NOT THE SUBSET.... } I can't get the sequences subset but all the sequences. So I make a print() in the closure of add_object_condition, but nothing is printed. It seems like it does not execute the sub in add_object_condition(), but add_object_conditions return true value. I try to use add_object_condition who seems to be a powerfull method, but I do not succeed. May I ask you some advice how to use add_object_condition() ? Do I forget something ? Best regards Pierre Montalent INRA - Ferme du moulon France From jarodpardon at yahoo.com.cn Thu Apr 9 20:27:29 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Fri, 10 Apr 2009 08:27:29 +0800 (CST) Subject: [Bioperl-l] bioperl translate() function for seq obj Message-ID: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Hi, all, I want to know whether Bio::PrimarySeqI::translate() uses identical method and codon table with NCBI Blast/blastx does. Thanks. Jarod ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From csembry at ualr.edu Thu Apr 9 20:54:21 2009 From: csembry at ualr.edu (Charles Embry) Date: Thu, 09 Apr 2009 19:54:21 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 Message-ID: Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. I get this error when i run the make file. "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL Writing Makefile for Bio::Ext::Align ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. Must be of the form '#.##'. (For instance '1.23') ?at ./Makefile.PL line 4" This is the first? 11 lines of the Makefile.PL for ext package use Inline::MakeMaker; use Config; WriteInlineMakefile( ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, ???????????????????????? 'Bio::SeqIO::abi' => 0.0, ?????????????????????? }, # e.g., Module::Name => 1.1, ??????????? test??????????????? => { TESTS => 'test.pl' }, ?????????? ); What does the error mean? And what version does it refer to? Of what? (staden?) What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? Thanks you C. Stephen Embry From maj at fortinbras.us Thu Apr 9 21:16:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Apr 2009 21:16:18 -0400 Subject: [Bioperl-l] bioperl translate() function for seq obj In-Reply-To: <221543.32779.qm@web15003.mail.cnb.yahoo.com> References: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Message-ID: Hi Jarod- translate() uses NCBI "Standard" table by default. Check out the POD for PrimarySeqI.pm (where translate is defined). You can specify others by setting -CODONTABLE_ID => $n as an argument to translate(). The codon tables are in Bio::Tools::CodonTable, where the following are defined: @NAMES = #id ( 'Standard', #1 'Vertebrate Mitochondrial',#2 'Yeast Mitochondrial',# 3 'Mold, Protozoan, and CoelenterateMitochondrial and Mycoplasma/Spiroplasma',#4 'Invertebrate Mitochondrial',#5 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6 '', '', 'Echinoderm Mitochondrial',#9 'Euplotid Nuclear',#10 '"Bacterial"',# 11 'Alternative Yeast Nuclear',# 12 'Ascidian Mitochondrial',# 13 'Flatworm Mitochondrial',# 14 'Blepharisma Nuclear',# 15 'Chlorophycean Mitochondrial',# 16 '', '', '', '', 'Trematode Mitochondrial',# 21 'Scenedesmus obliquus Mitochondrial', #22 'Thraustochytrium Mitochondrial' #23 ); Can others (Scott M?) chime in on blast? Mark ----- Original Message ----- From: "?? ??" To: "'bioperl-l'" Sent: Thursday, April 09, 2009 8:27 PM Subject: [Bioperl-l] bioperl translate() function for seq obj > > > Hi, all, > I want to know whether Bio::PrimarySeqI::translate() uses identical method and > codon table with NCBI Blast/blastx does. Thanks. > > Jarod > > > ___________________________________________________________ > ?????????????????????????????????? > http://card.mail.cn.yahoo.com/ > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rrfreimuth2 at yahoo.com Thu Apr 9 22:10:21 2009 From: rrfreimuth2 at yahoo.com (Robert Freimuth) Date: Thu, 9 Apr 2009 19:10:21 -0700 (PDT) Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code Message-ID: <38796.60680.qm@web65611.mail.ac4.yahoo.com> The Perl Foundation is looking for mentors for several projects for Google's Summer of Code.? Two of the projects are directly applicable to bioperl. In particular they're looking for mentors for these projects: Bio::Restriction::* - Improve reading and writing of RE collection in different formats; add support for multicut/multisite enzymes.A bioperl parser module for repeats/transposons."CPAN OS Installer", integrate CPAN packages into Unix package managers like rpm and apt/dpkgCross-platform Perl Bindings for wxWebKit If you're interested please see the full announcement, posted on PerlMonks:? http://www.perlmonks.org/?node_id=755872. Thanks, Bob From j_martin at lbl.gov Thu Apr 9 23:18:28 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 9 Apr 2009 20:18:28 -0700 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: References: Message-ID: <20090410031827.GE6535@eniac.jgi-psf.org> Hello, I found that 1.5.1 a pain to install, I recommend the code from http://www.bioperl.org/wiki/Ext_package#The_latest_code anywho, the read is read.pm, the message is something from inline::c I think, there's an old bug report about it, if you can't use the newer code maybe it will help. http://bugzilla.open-bio.org/show_bug.cgi?id=2074 joel On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: > Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. > I get this error when i run the make file. > > "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. > Must be of the form '#.##'. (For instance '1.23') > ?at ./Makefile.PL line 4" > > This is the first? 11 lines of the Makefile.PL for ext package > > use Inline::MakeMaker; > use Config; > > WriteInlineMakefile( > ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', > ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, > ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, > ???????????????????????? 'Bio::SeqIO::abi' => 0.0, > ?????????????????????? }, # e.g., Module::Name => 1.1, > ??????????? test??????????????? => { TESTS => 'test.pl' }, > ?????????? ); > > What does the error mean? > > And what version does it refer to? Of what? (staden?) > What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? > > > Thanks you > C. Stephen Embry > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hsa_rim at yahoo.co.in Thu Apr 9 23:43:53 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Fri, 10 Apr 2009 09:13:53 +0530 (IST) Subject: [Bioperl-l] Creating Cytoband Ideogram images Message-ID: <824645.66937.qm@web94611.mail.in2.yahoo.com> Hi, I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. Thanks in advance Shafeeq Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ From hlapp at gmx.net Fri Apr 10 00:00:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 10 Apr 2009 00:00:54 -0400 Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code In-Reply-To: <38796.60680.qm@web65611.mail.ac4.yahoo.com> References: <38796.60680.qm@web65611.mail.ac4.yahoo.com> Message-ID: <0C80FD8F-78F6-493E-94C3-AE5D845577C5@gmx.net> Hi Robert - thanks for putting us into the loop! On Apr 9, 2009, at 10:10 PM, Robert Freimuth wrote: > The Perl Foundation is looking for mentors for several projects for > Google's Summer of Code. Two of the projects are directly applicable > to bioperl. > > In particular they're looking for mentors for these projects: > > Bio::Restriction::* - Improve reading and writing of RE collection in > different formats; add support for multicut/multisite enzymes.A > bioperl parser module for repeats/transposons. I don't want to dampen any enthusiasm and the project may indeed be worthwhile, but it's also worth noting that we haven't ever seen the student applicant here (assuming it's the same who contacted Heikki a while ago). Having said that, the fact that there hasn't been any community interaction from the student yet obviously doesn't have to mean that there can't be any in the future. But in the Google Summer of Code spirit of recruiting new contributors into FLOSS communities, it's a less than ideal start. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Apr 10 00:15:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:15:45 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: <20090410031827.GE6535@eniac.jgi-psf.org> References: <20090410031827.GE6535@eniac.jgi-psf.org> Message-ID: <327D2C1C-A61A-473A-B85D-7A249856CC85@illinois.edu> Just to note, we're not actively supporting much of the bioperl-ext code, in favor of the BioLib initiative: http://biolib.open-bio.org/wiki/Main_Page If you do use bioperl-ext I suggest only using the latest code from svn (and that in combination with bioperl-live). chris On Apr 9, 2009, at 10:18 PM, Joel Martin wrote: > Hello, > I found that 1.5.1 a pain to install, I recommend the code from > > http://www.bioperl.org/wiki/Ext_package#The_latest_code > > anywho, the read is read.pm, the message is something from > inline::c I think, there's an old bug report about it, if > you can't use the newer code maybe it will help. > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > joel > > > On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: >> Hello I am a graduate student at UALR and I am trying to install >> the ext package(1.5.1) on bioperl 1.5.1. >> I get this error when i run the make file. >> >> "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL >> Writing Makefile for Bio::Ext::Align >> ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/ >> Bio/SeqIO/staden/Makefile.PL: Invalid version '' for >> Bio::SeqIO::staden::read. >> Must be of the form '#.##'. (For instance '1.23') >> at ./Makefile.PL line 4" >> >> This is the first 11 lines of the Makefile.PL for ext package >> >> use Inline::MakeMaker; >> use Config; >> >> WriteInlineMakefile( >> 'NAME' => 'Bio::SeqIO::staden::read', >> 'VERSION_FROM' => './read.pm', # finds $VERSION, >> 'PREREQ_PM' => { 'Inline::C' => 0.0, >> 'Bio::SeqIO::abi' => 0.0, >> }, # e.g., Module::Name => 1.1, >> test => { TESTS => 'test.pl' }, >> ); >> >> What does the error mean? >> >> And what version does it refer to? Of what? (staden?) >> What version of Staden should this be if i am using the >> io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? >> >> >> Thanks you >> C. Stephen Embry >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 10 00:32:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:32:59 -0500 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Message-ID: <0340305E-EAB3-4A08-9B41-5E706F4A5A16@illinois.edu> Would definitely be worth testing out interactivity with these. chris On Apr 8, 2009, at 12:59 AM, Steve Chervitz wrote: > Check out our Affymetrix Power Tools (APT) package: > > http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx > > We distribute binaries for Linux and Mac OSX, as well as source code > so you can compile it yourself if you want. Note however that this is > written in C++, not Perl. We don't provide SWIG or XS interfaces for > direct access via Perl, though this would definitely be doable, if > anyone is interested. > > Probably the easiest approach from Perl would be to simply call the > appropriate APT executable through the shell as in: > > system("/path/to/apt --args ..."); > > The Perl code can parse the output files and take it from there. > > Steve > > > On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis > wrote: >> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG > >wrote: >> >>> Dear all, >>> >>> Recently, I focus on population genomics data outputed by affymatrix >>> microarray system. However, softwares which designed by affy. inc >>> only run >>> in Windows 386 platform. Is there any application can used in Linux? >>> Bio::Affymatrix was not strong enough to get the detailed >>> informaton. >>> >> >> You may want to look at a non-bioperl solution such as Bioconductor ( >> http://bioconductor.org). >> >> Sean >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 1 17:56:36 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 23:56:36 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = $taxDB? >get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB?>get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB?>get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB?>get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each blast result #Extract the GI my $superkingdom = $taxDB- >get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem with it don't hesitate to contact me. Hope this helps, M; -------------- next part -------------- A non-text attachment was scrubbed... Name: tax2bin2.pl Type: text/x-perl-script Size: 400 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: taxbuild.pm Type: text/x-perl-script Size: 10599 bytes Desc: not available URL: -------------- next part -------------- El 01/04/2009, a las 19:03, Florent Angly escribi?: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that > you won't be able to put its information in a hash (unless you have > a lot of memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you >> created custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >> . >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the >> nodes.dmp file contained within lists what division each tax_id >> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >> so you can probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 >> 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to >>> parse a blast >>> report (refseq blast) and check how may organisms are eukaryotes >>> and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the >>> example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Fri Apr 10 00:34:03 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Apr 2009 23:34:03 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: Estelle, Always direct your questions to the bioperl mail list (I'm cc'ing them now). I'm not sure about using that option, maybe someone else can answer? chris On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > Dear Mr Fields, > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > to run > blastpgp. > However, the -C option (save a checkpoint in ASN.x) seems not > available in > this module (options are -j, -h, -c, -B, and -Q). Is there another > way to > save the checkpoint? > > I thank you by advance (and apologize for my English). > > Estelle From jaleto at gmail.com Fri Apr 10 03:50:46 2009 From: jaleto at gmail.com (Jonathan Leto) Date: Fri, 10 Apr 2009 00:50:46 -0700 Subject: [Bioperl-l] Google Summer of Code 2009 BioPerl Student Applications Message-ID: <9aaadf9c0904100050g7f82f925s2e9bae9646da6cd5@mail.gmail.com> Howdy, There are two student applications for The Perl Foundation this year which are BioPerl-related, and I would very much like for them to succeed, but most of the current mentors do not have the background to judge whether they are possible in the time given, or what most of words mean for that matter. We really need some feedback from BioPerl people as to the viability of this applications, as well as comments and suggestions for implementation issues. Please sign up at the GSoC web app [1], then apply to be a mentor for The Perl Foundation. It requires me to manually accept you and then you will be able to view the 19 applications we received this year. Please also join the private mentor list [2] and the students+mentors list [3] if you would like to keep up to date and get involved. Welcome! Cheers, [1] http://socghop.appspot.com/ [2] http://groups.google.com/group/tpf-gsoc [3] http://groups.google.com/group/tpf-gsoc-students -- [---------------------] Jonathan Leto jaleto at gmail.com From scott at scottcain.net Fri Apr 10 09:08:53 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 10 Apr 2009 09:08:53 -0400 Subject: [Bioperl-l] Creating Cytoband Ideogram images In-Reply-To: <824645.66937.qm@web94611.mail.in2.yahoo.com> References: <824645.66937.qm@web94611.mail.in2.yahoo.com> Message-ID: <536f21b00904100608w23484c5bi3765da39b6b4d946@mail.gmail.com> Hello Shafeeq, You need Bio::Graphics::Glyph::ideogram, which is part of Bio::Graphics. You can install it from cpan and it will install BioPerl 1.6 as a prereq. The perldoc for ideogram.pm has example code and data, since the format of the data is important. Scott On Thu, Apr 9, 2009 at 11:43 PM, shafeeq rim wrote: > Hi, > > I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. > > Thanks in advance > Shafeeq > > > > ? ? ?Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Fri Apr 10 09:32:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 10 Apr 2009 08:32:00 -0500 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: I don't know if this has been pointed out, but Bio::DB::Taxonomy is also capable of indexing and using the NCBI tax flat files. use Bio::DB::Taxonomy; my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namefile); # use other Bio::DB::Taxonomy methods chris On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > You may find the attached Perl module useful. It solves the > difficult parts of getting the taxonomy given a GI identifier or a > taxID. It is designed to be able to process a high number of GIs > very fast and with low memory usage. > > An example of usage would be: > > use taxbuild; > #Build the taxonomyDB > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > # Get the taxonomy given a GI identifier > my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); > > # Get the taxonomy term of a GI identifier at a given level > my $term_at_level = $taxDB? > >get_term_at_level_from_gi("35961124","family"); > > # Get the taxid of a GI identifier > my $taxid = $taxDB?>get_taxid("35961124"); > > # Get the taxonomy given a taxid > my @tax = $taxDB?>get_taxonomy($taxid); > > # Get the taxonomy at a given level given a taxid > my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); > > # Get the level of a given taxonomical name > my $level = $taxDB?>get_level_from_name("Proteobacteria"); > > The "dict file" is a processed version of the gi_taxid file from > taxonomyDB. You can get this file by running the tax2bin2.pl script > also attached: > > $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin > or, if you are working with genes instead of proteins: > $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin > > A possible solution to the original post using this module would be > something like: > > # Initialize the taxonomyDB once. > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > #For each blast result > #Extract the GI > my $superkingdom = $taxDB- > >get_term_at_level_from_gi($gi,"superkingdom"); > if ($superkingdom eq "Bacteria") { > # Do whatever you want > } elsif ($superkingdom eq "Eukaryota") > # Do whatever you want > } > > > The module has been tested mainly in Linux systems, but should run > without problems in Windows and Mac too. If you encounter any > problem with it don't hesitate to contact me. > > Hope this helps, > > M; > > > > > > El 01/04/2009, a las 19:03, Florent Angly escribi?: > >> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >> you won't be able to put its information in a hash (unless you have >> a lot of memory). >> Florent >> >> Smithies, Russell wrote: >>> The taxonomy information isn't in the blast output unless you >>> created custom fasta headers for your blast database. >>> The easiest way to get the tax_id for your accessions would be to >>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>> . >>> If you load that file into a hash, parse the accessions out of the >>> blast hits then lookup the tax_id from that hash, I think it >>> should be fairly fast. >>> Checking which are prokaryotes and which are eukaryotes based on >>> tax_id is a separate problem :-) >>> If you grab the taxdump.tar.gz file from the same site, the >>> nodes.dmp file contained within lists what division each tax_id >>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >>> so you can probably work it out from that. >>> >>> It's not a very BioPerly solution but sometimes just looking up >>> the answer from a file/table/hash is the simplest way. >>> Hope this helps, >>> >>> Russell Smithies >>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>> To: bioperl-l >>>> Subject: [Bioperl-l] taxonomy ID >>>> >>>> Hi All, >>>> I am writing a script, for one of its part i have to >>>> parse a blast >>>> report (refseq blast) and check how may organisms are eukaryotes >>>> and how >>>> namy of them are prokaryotes. >>>> I am using BIO::DB::taxinomy module: >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> But for this i need a taxonomyid (like '33090') given in the >>>> example. >>>> So is it possible to get a taxonomyid from refseq balst report? >>>> If not then how i can deal with this problem? >>>> >>>> i would really appreciate if anyone can help me out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Fri Apr 10 09:42:15 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 10 Apr 2009 09:42:15 -0400 Subject: [Bioperl-l] Query about Bioperl and Mysql In-Reply-To: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> References: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> Message-ID: <264855a00904100642l482deebend6be66b140933c2c@mail.gmail.com> On Sat, Mar 28, 2009 at 8:41 AM, Sanjay Harke wrote: > Dear friends, > > anybody nows about my following problem. > > !) I want to use my own database mysql with Bioperl > > kindly guide for it. > You'll want to look at the perl DBI and DBD::mysql modules. Sean From bosborne11 at verizon.net Fri Apr 10 09:55:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 09:55:00 -0400 Subject: [Bioperl-l] Access Uniprot detailed information In-Reply-To: <22951210.post@talk.nabble.com> References: <22951210.post@talk.nabble.com> Message-ID: <4C3C5234-31F7-4EEF-BBA0-9B912D21F210@verizon.net> Markus, There is some discussion of the structure of "swiss" format files in the Feature-Annotation HOW TO. Have you taken a look at this? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Other_Sequence_File_Formats This section does not explain all the fields in each different format, but it shows you code that you can run that will print out all the annotations and features. You're really asking 2 questions, I think. Have you figured out how to retrieve a sequence? See if this helps you: http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Brian O. On Apr 8, 2009, at 10:07 AM, manni122 wrote: > > Hi there, > maybe I am not able to read careful enough through the Howto section. > But is there a function in BioPerl that retrieves for a given > Uniprot Access > Code or ID from the Uniprot Database some general annotations like > enzymatic > activity or literature references? > I appreciate any help! > -- > View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Apr 10 10:05:06 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 10:05:06 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: <22816585.post@talk.nabble.com> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <22816585.post@talk.nabble.com> Message-ID: Dereje, There's a HOW TO that discusses an approach similar to this (Using local Genbank and Entrez Gene files): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences But the provided script uses Gene ids, not chromosome names. The more general suggestion would be to look at the module Bio::DB::Fasta. Brian O. On Mar 31, 2009, at 6:59 PM, demis001 wrote: > > Hi , > > I am new to BioPerl and this forum and even do not know how to post > the new > post. I have one question for you guys. > > Is there any BioPerl module that allows me to download sequence > based on > chromosome name, seqStart and SeqEnd given the formatted human genome > database downloaded on my Linux desktop? > > I used to do this using Perl $URI object and it is really slow as the > process depend on the network. To be more specific, I took chrName, > seqStart > and seqEnd and go to Ensembl database to get the sequence one by one > using > Perl $URI object. > > I thought it might be easier if I process locally using indexed > database > using BioPerl module if there is any designed for this purpose. > > Input, millions rows of tab delimited (CSV) file contain > information about > chrName, seqStart, seqEnd. Locally formatted/indexed human genome. > Output > should be the fasta sequence contain the sequence and with the header > contain chr name and location persed > > Sorry if I posted in the wrong section of the forum and happy to > get any > recommendation. > Thanks > > Govind Chandra wrote: >> >> Hi, >> >> The code below >> >> >> ====== code begins ======= >> #use strict; >> use Bio::SeqIO; >> >> $infile='NC_000913.gbk'; >> my $seqio=Bio::SeqIO->new(-file => $infile); >> my $seqobj=$seqio->next_seq(); >> my @features=$seqobj->all_SeqFeatures(); >> my $count=0; >> foreach my $feature (@features) { >> unless($feature->primary_tag() eq 'CDS') {next;} >> print($feature->start()," ", $feature->end(), " >> ",$feature->strand(),"\n"); >> $ac=$feature->annotation(); >> $temp1=$ac->get_Annotations("locus_tag"); >> @temp2=$ac->get_Annotations(); >> print("$temp1 $temp2[0] @temp2\n"); >> if($count++ > 5) {last;} >> } >> >> print(ref($ac),"\n"); >> exit; >> >> ======= code ends ======== >> >> produces the output >> >> ========== output begins ======== >> >> 190 255 1 >> 0 >> 337 2799 1 >> 0 >> 2801 3733 1 >> 0 >> 3734 5020 1 >> 0 >> 5234 5530 1 >> 0 >> 5683 6459 -1 >> 0 >> 6529 7959 -1 >> 0 >> Bio::Annotation::Collection >> >> =========== output ends ========== >> >> $ac is-a Bio::Annotation::Collection but does not actually contain >> any >> annotation from the feature. Is this how it should be? I cannot >> figure >> out what is wrong with the script. Earlier I used to use has_tag(), >> get_tag_values() etc. but the documentation says these are >> deprecated. >> >> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >> uname >> -a is >> >> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> x86_64 x86_64 x86_64 GNU/Linux >> >> Thanks in advance for any help. >> >> Govind >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Apr 10 11:51:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 10 Apr 2009 08:51:45 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <6B951DED-0632-451C-86A4-2A215B1CAE6C@bioperl.org> The only difference to the DB::Taxonomy module I can see is we don't specifically have the dictionary part -- for gi -> taxid, but I just do a local DBHash index of that when I need it. -jason On Apr 10, 2009, at 6:32 AM, Chris Fields wrote: > I don't know if this has been pointed out, but Bio::DB::Taxonomy is > also capable of indexing and using the NCBI tax flat files. > > use Bio::DB::Taxonomy; > > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namefile); > > # use other Bio::DB::Taxonomy methods > > chris > > On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > >> You may find the attached Perl module useful. It solves the >> difficult parts of getting the taxonomy given a GI identifier or a >> taxID. It is designed to be able to process a high number of GIs >> very fast and with low memory usage. >> >> An example of usage would be: >> >> use taxbuild; >> #Build the taxonomyDB >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> # Get the taxonomy given a GI identifier >> my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); >> >> # Get the taxonomy term of a GI identifier at a given level >> my $term_at_level = $taxDB? >> >get_term_at_level_from_gi("35961124","family"); >> >> # Get the taxid of a GI identifier >> my $taxid = $taxDB?>get_taxid("35961124"); >> >> # Get the taxonomy given a taxid >> my @tax = $taxDB?>get_taxonomy($taxid); >> >> # Get the taxonomy at a given level given a taxid >> my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); >> >> # Get the level of a given taxonomical name >> my $level = $taxDB?>get_level_from_name("Proteobacteria"); >> >> The "dict file" is a processed version of the gi_taxid file from >> taxonomyDB. You can get this file by running the tax2bin2.pl script >> also attached: >> >> $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin >> or, if you are working with genes instead of proteins: >> $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin >> >> A possible solution to the original post using this module would be >> something like: >> >> # Initialize the taxonomyDB once. >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> #For each blast result >> #Extract the GI >> my $superkingdom = $taxDB- >> >get_term_at_level_from_gi($gi,"superkingdom"); >> if ($superkingdom eq "Bacteria") { >> # Do whatever you want >> } elsif ($superkingdom eq "Eukaryota") >> # Do whatever you want >> } >> >> >> The module has been tested mainly in Linux systems, but should run >> without problems in Windows and Mac too. If you encounter any >> problem with it don't hesitate to contact me. >> >> Hope this helps, >> >> M; >> >> >> >> >> >> El 01/04/2009, a las 19:03, Florent Angly escribi?: >> >>> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >>> you won't be able to put its information in a hash (unless you >>> have a lot of memory). >>> Florent >>> >>> Smithies, Russell wrote: >>>> The taxonomy information isn't in the blast output unless you >>>> created custom fasta headers for your blast database. >>>> The easiest way to get the tax_id for your accessions would be to >>>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>>> . >>>> If you load that file into a hash, parse the accessions out of >>>> the blast hits then lookup the tax_id from that hash, I think it >>>> should be fairly fast. >>>> Checking which are prokaryotes and which are eukaryotes based on >>>> tax_id is a separate problem :-) >>>> If you grab the taxdump.tar.gz file from the same site, the >>>> nodes.dmp file contained within lists what division each tax_id >>>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, >>>> etc) so you can probably work it out from that. >>>> >>>> It's not a very BioPerly solution but sometimes just looking up >>>> the answer from a file/table/hash is the simplest way. >>>> Hope this helps, >>>> >>>> Russell Smithies >>>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>>> To: bioperl-l >>>>> Subject: [Bioperl-l] taxonomy ID >>>>> >>>>> Hi All, >>>>> I am writing a script, for one of its part i have to >>>>> parse a blast >>>>> report (refseq blast) and check how may organisms are eukaryotes >>>>> and how >>>>> namy of them are prokaryotes. >>>>> I am using BIO::DB::taxinomy module: >>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>>> >>>>> But for this i need a taxonomyid (like '33090') given in the >>>>> example. >>>>> So is it possible to get a taxonomyid from refseq balst report? >>>>> If not then how i can deal with this problem? >>>>> >>>>> i would really appreciate if anyone can help me out. >>>>> >>>>> Thanks >>>>> Shalabh >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From SMarkel at accelrys.com Fri Apr 10 12:01:25 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 10 Apr 2009 12:01:25 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729E04A77@exch1-hi.accelrys.net> Estelle, Are you using the most recent version of Bio::Tools::Run::StandAloneNCBIBlast? The available blastpgp parameters are our @BLASTPGP_PARAMS = qw(A B C E F G H I J K L M N O P Q R S T U W X Y Z a b c e f h j k l m q s t u v y z); See line 94. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, 09 April 2009 9:34 PM > To: Estelle Proux > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp > > Estelle, > > Always direct your questions to the bioperl mail list (I'm cc'ing them > now). I'm not sure about using that option, maybe someone else can > answer? > > chris > > On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > > > Dear Mr Fields, > > > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > > to run > > blastpgp. > > However, the -C option (save a checkpoint in ASN.x) seems not > > available in > > this module (options are -j, -h, -c, -B, and -Q). Is there another > > way to > > save the checkpoint? > > > > I thank you by advance (and apologize for my English). > > > > Estelle > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sat Apr 11 09:50:20 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sat, 11 Apr 2009 21:50:20 +0800 (CST) Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Hi, all, I use Bio::SeqIO driver to parse data files. The input data is somewhat buggy, and some of entries are not strict in format. The parser will throw exceptions and halt when meeting these bad entries. However, I want to just skip these entries, not stop there. So how to suppress exceptions? Thanks. Jarod ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From maj at fortinbras.us Sat Apr 11 11:32:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 11 Apr 2009 11:32:16 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: missed the list. ----- Original Message ----- From: "Mark A. Jensen" To: "?? ??" Sent: Saturday, April 11, 2009 10:52 AM Subject: Re: [Bioperl-l] how to suppress Bioperl exceptions > Hey Jarod- > You can try setting the verbosity of the object negative, as > > $seqio->verbose(-1); > > I've found, though, that the warning messages still come through > sometimes. I've gotten control of these using the Error package: > > use Error qw(:try); > > try { > $seqio = Bio::SeqIO->new(-file='my.fas'); > } > catch Error with { > my $e = shift; > # $e->test will contain the message > }; > > Note the lack of ; after the try block, and the > presence thereof after the catch block. > > cheers -Mark > ----- Original Message ----- > From: "?? ??" > To: > Sent: Saturday, April 11, 2009 9:50 AM > Subject: [Bioperl-l] how to suppress Bioperl exceptions > > >> >> Hi, all, >> I use Bio::SeqIO driver to parse data files. The input data is somewhat >> buggy, and some of entries are not strict in format. The parser will throw >> exceptions and halt when meeting these bad entries. However, I want to just >> skip these entries, not stop there. So how to suppress exceptions? >> Thanks. >> >> Jarod >> >> >> >> ___________________________________________________________ >> ?????????????????????????????????? >> http://card.mail.cn.yahoo.com/ >> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Sat Apr 11 11:56:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Apr 2009 11:56:35 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions In-Reply-To: <936515.8386.qm@web15007.mail.cnb.yahoo.com> References: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Message-ID: Hi Jarod, in addition to Mark's response, what you say in your message would mean that corruption is in specific entries of a file and you want to skip those, rather than entire files. If this is true, then you'd have to put the $seq=$seqio->next_seq() call into the try {} block as that'd be the one that would raise the exception. The SeqIO parsers don't generally guarantee though that they will gracefully recover from a parsing error and advance to the next record; I think the genbank parser will do that, but you will definitely want to check that. -hilmar On Apr 11, 2009, at 9:50 AM, ?? ?? wrote: > > Hi, all, > I use Bio::SeqIO driver to parse data files. The input data is > somewhat buggy, and some of entries are not strict in format. The > parser will throw exceptions and halt when meeting these bad > entries. However, I want to just skip these entries, not stop there. > So how to suppress exceptions? > Thanks. > > Jarod > > > > ___________________________________________________________ > ?????????????????????????????????? > http://card.mail.cn.yahoo.com/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From oleksii.nikolaienko at gmail.com Sun Apr 12 07:10:47 2009 From: oleksii.nikolaienko at gmail.com (Oleksii Nikolaienko) Date: Sun, 12 Apr 2009 14:10:47 +0300 Subject: [Bioperl-l] GSoC proposal Message-ID: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Hi all! My name is Oleksii, I`m PhD student and I`d like to receive your comments on my proposal for Google summer of code. It`s called "bioperl-live::Bio::Restriction::* - implementing missing features" and I`m going to: 1) add support for reading and writing different file formats for module Bio::Restriction::IO 2) add support for multicut/multisite enzymes 3) add information on recommended buffers, restriction efficiency, sensitivity to methylation, etc and corresponding new methods 4) update documentation for Bio::Restriction::* modules Thanks in advance for your suggestions. notch From roy.chaudhuri at gmail.com Tue Apr 14 10:54:21 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 14 Apr 2009 15:54:21 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error Message-ID: <49E4A39D.2020909@gmail.com> Hi Mike. I did get that problem solved in the end, thanks to lots of help from Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped cc-ing the mailing list at some point. The last archived message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had the correct solution - the code change was incorporated into the bioperl-ext CVS, and is in the latest version that you can get from SVN (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the problem you must be experiencing a different issue. You should also bear in mind the message Chris Fields sent to the list a few days ago, and have a look at using BioLib instead: > Just to note, we're not actively supporting much of the bioperl-ext > code, in favor of the BioLib initiative: > > http://biolib.open-bio.org/wiki/Main_Page > > If you do use bioperl-ext I suggest only using the latest code from > svn (and that in combination with bioperl-live). > > chris Hope this helps. Roy. Michael Stubbington wrote: > Dear Dr. Chaudhuri, > > I am currently trying to write a bioperl script that parses .abi > sequence files. I am having exactly the same problem as you did when > you posted this enquiry to the bioperl mailing list > http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was > wondering if you ever solved the problem and, if so, can you remember > what you did? I?d be very grateful for any help you can provide. I > can?t find this problem mentioned anywhere else online. > > Thank you for your time. > > > > Mike -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From cjfields at illinois.edu Tue Apr 14 11:20:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 10:20:00 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <49E4A39D.2020909@gmail.com> References: <49E4A39D.2020909@gmail.com> Message-ID: For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I?d be very grateful for any help you can provide. I >> can?t find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 14 14:21:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 13:21:43 -0500 Subject: [Bioperl-l] GSoC proposal In-Reply-To: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> References: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Message-ID: On Apr 12, 2009, at 6:10 AM, Oleksii Nikolaienko wrote: > Hi all! > My name is Oleksii, I`m PhD student and I`d like to receive your > comments on > my proposal for Google summer of code. It`s called > "bioperl-live::Bio::Restriction::* - implementing missing features" > and I`m > going to: > > 1) add support for reading and writing different file formats for > module Bio::Restriction::IO You should specify which formats you intend on working with. It's known that several formats don't carry all data, for instance prototype information, vendors, etc. so that should be documented for end-users. You should probably suggest a workaround for getting at missing data (i.e. a format that carries all info, retrieving prototype data separately, etc). > 2) add support for multicut/multisite enzymes Agreed, though you should be more specific on how you intend to implement this. From the Bio::Restriction::Enzyme documentation the sequence site is supposed to be a Bio::PrimarySeq (though I would probably change that internally so it creates these on the fly from the stored string). Multicut/multisite implies list context return, so it may become an API issue (and using wantarray as a workaround is fraught with problematic API traps that I suggest avoiding if at all possible). > 3) add information on recommended buffers, restriction > efficiency, > sensitivity to methylation, etc and corresponding new methods Much of this should probably be outlined in the corresponding interface class prior to implementation. > 4) update documentation for Bio::Restriction::* modules Yes, completely agree. This should be bumped closer to the top of the priority list (and outlined in the interface classes). > Thanks in advance for your suggestions. > > notch > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l chris From j_martin at lbl.gov Wed Apr 15 02:50:37 2009 From: j_martin at lbl.gov (Joel Martin) Date: Tue, 14 Apr 2009 23:50:37 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <20090415065037.GB1175@eniac.jgi-psf.org> Hello, Do you know where it says io_lib will stop supporting ABI? We use the latest ( 1.11.6 ) for this so I know it does read them and I just checked with one fresh off a sequencer. But I couldn't find an active forum for staden. Thanks, Joel On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: > For ABI files you'll need an older version of io_lib that supports ABI or > the io_lib that comes with the full staden package. Recent versions of > io_lib don't have ABI support built-in anymore. > > chris > > On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > >> Hi Mike. >> >> I did get that problem solved in the end, thanks to lots of help from >> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >> cc-ing the mailing list at some point. The last archived message in the >> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >> the correct solution - the code change was incorporated into the >> bioperl-ext CVS, and is in the latest version that you can get from SVN >> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >> problem you must be experiencing a different issue. >> >> You should also bear in mind the message Chris Fields sent to the list a >> few days ago, and have a look at using BioLib instead: >> >>> Just to note, we're not actively supporting much of the bioperl-ext >>> code, in favor of the BioLib initiative: >>> http://biolib.open-bio.org/wiki/Main_Page >>> If you do use bioperl-ext I suggest only using the latest code from svn >>> (and that in combination with bioperl-live). >> > >>> chris >> >> Hope this helps. >> Roy. >> >> >> >> Michael Stubbington wrote: >>> Dear Dr. Chaudhuri, >>> I am currently trying to write a bioperl script that parses .abi sequence >>> files. I am having exactly the same problem as you did when >>> you posted this enquiry to the bioperl mailing list >>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>> wondering if you ever solved the problem and, if so, can you remember >>> what you did? I?d be very grateful for any help you can provide. I >>> can?t find this problem mentioned anywhere else online. >>> Thank you for your time. >>> Mike >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Apr 15 08:26:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Apr 2009 07:26:15 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <20090415065037.GB1175@eniac.jgi-psf.org> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> Message-ID: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Joel, They haven't stopped supporting it. IIRC the separate io_lib distribution no longer has the ABI headers, but the io_lib with the full staden package does (a little confusing, yes). I have 1.11.6 and ABI-related tests for bioperl and bioperl-ext don't pass, but compiling with an earlier version does work. It may be as simple as including the header files from an old version, but I haven't tried that. chris On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > Hello, > Do you know where it says io_lib will stop supporting ABI? We use > the latest ( 1.11.6 ) for this so I know it does read them and I just > checked with one fresh off a sequencer. But I couldn't find an active > forum for staden. > > Thanks, > Joel > > On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >> For ABI files you'll need an older version of io_lib that supports >> ABI or >> the io_lib that comes with the full staden package. Recent >> versions of >> io_lib don't have ABI support built-in anymore. >> >> chris >> >> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >> >>> Hi Mike. >>> >>> I did get that problem solved in the end, thanks to lots of help >>> from >>> Aaron Mackey. Looking at the bioperl-l archives it seems like we >>> stopped >>> cc-ing the mailing list at some point. The last archived message >>> in the >>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/ >>> 018925.html) had >>> the correct solution - the code change was incorporated into the >>> bioperl-ext CVS, and is in the latest version that you can get >>> from SVN >>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't >>> solve the >>> problem you must be experiencing a different issue. >>> >>> You should also bear in mind the message Chris Fields sent to the >>> list a >>> few days ago, and have a look at using BioLib instead: >>> >>>> Just to note, we're not actively supporting much of the bioperl-ext >>>> code, in favor of the BioLib initiative: >>>> http://biolib.open-bio.org/wiki/Main_Page >>>> If you do use bioperl-ext I suggest only using the latest code >>>> from svn >>>> (and that in combination with bioperl-live). >>>> >>>> chris >>> >>> Hope this helps. >>> Roy. >>> >>> >>> >>> Michael Stubbington wrote: >>>> Dear Dr. Chaudhuri, >>>> I am currently trying to write a bioperl script that parses .abi >>>> sequence >>>> files. I am having exactly the same problem as you did when >>>> you posted this enquiry to the bioperl mailing list >>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>> wondering if you ever solved the problem and, if so, can you >>>> remember >>>> what you did? I?d be very grateful for any help you can provide. I >>>> can?t find this problem mentioned anywhere else online. >>>> Thank you for your time. >>>> Mike >>> >>> -- >>> Dr. Roy Chaudhuri >>> Department of Veterinary Medicine >>> University of Cambridge, U.K. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Michael.Stubbington at hpa.org.uk Wed Apr 15 03:43:39 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Wed, 15 Apr 2009 08:43:39 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <335635A922FA2B43B35B6ADD7929CC590171550C@porhpaexc001.HPA.org.uk> Thanks a lot for your help. I finally solved the problem with a combination of: 1) Checking out the latest bioperl-ext from svn. 2) A fresh install of an earlier version of io_lib (8.12) 3) Changing to "config.h" in os.h Everything seems to be working now. Best wishes, Mike -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: 14 April 2009 16:20 To: Roy Chaudhuri Cc: Michael Stubbington; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Bio::SeqIO::staden::read make test error For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I'd be very grateful for any help you can provide. I >> can't find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From cjfields1 at gmail.com Mon Apr 20 12:12:10 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Mon, 20 Apr 2009 11:12:10 -0500 Subject: [Bioperl-l] BioPerl 1.6.1 slate Message-ID: <58CCB0F1-9BC8-4437-8870-3D6CAA7BB1ED@gmail.com> All, Just to note, the bioperl 1.6.1 release will probably be delayed until mid-May (just been too busy to work on it, end-of-semester crunch and all). I'll probably release an alpha prior to that (maybe first week of May) for testing some bug fixes across platforms. cheers! chris From nagel at moldiag.de Tue Apr 21 10:31:29 2009 From: nagel at moldiag.de (Mato Nagel) Date: Tue, 21 Apr 2009 16:31:29 +0200 Subject: [Bioperl-l] Exact codon numbering Message-ID: <49EDD8C1.7000101@moldiag.de> Dear colleagues, I spend this evening browsing all your information but didn't succeed in finding a module that translates feature data (CDS and mRNA) into codon numbering. I developed a routine that from an NCBI xml-file creates a structure $exonstructure =[ splice_variant_1->[exon_1->{mRNA_from ->'1', mRNA_to->'something', cDNA_from->'something', cDNA_to->'something', CDS_from->'something', CDS_to->'something', } exon_2->{...} ... ] splice_variant_2 [... ] ] I wonder if it is worth publishing this routine in BioPerl. Looking forward to receiving an answer. Sincerely Yours Mato Nagel From dan.bolser at gmail.com Wed Apr 22 06:49:42 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 11:49:42 +0100 Subject: [Bioperl-l] Creating a fastq format file? Message-ID: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Creating a fastq format file from fasta and 'fasta quality file'? Hi, I'm sure this is easy, but I'm still not able to 'think bioperl'... I have a 'fasta quality file' and a fasta file, and I would like to output a fastq file. I followed the discussion on the previous thread here: http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html With the conclusion seeming to be 'just do it'. Could someone point me at a way to do this, or was that suggestion an error? i.e. the poster was working out a way to create a fastq the only way possible... I get the feeling that this should be a one-liner, but perhaps the above thread was demonstrating the code I need to copy. Thanks for any suggestions, Dan. From drummike at gmail.com Wed Apr 22 08:28:08 2009 From: drummike at gmail.com (Mike Williams) Date: Wed, 22 Apr 2009 08:28:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > Creating a fastq format file from fasta and 'fasta quality file'? > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? Hi there. You should take a look at the documentation for formatdb, that will get you there. http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Mike From dan.bolser at gmail.com Wed Apr 22 09:10:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:10:14 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220610m7ef63a63m8590956d32d57d17@mail.gmail.com> 2009/4/22 Mike Williams : > On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? > > > Hi there. ?You should take a look at the documentation for formatdb, that > will get you there. > > http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Really? I don't find the word fastq anywhere in that file... I know the fastq format isn't that complex, but why write my own custom conversion utility if one already exists right? Bioperl is so good at converting between other formats, I just assumed there should be a couple of lines to get this done. Cheers, Dan. -- Talk live to HOT bioperl developers in your area NOW!! irc://irc.freenode.net/#bioperl > Mike > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Wed Apr 22 09:32:15 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:32:15 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> In the Bio::SeqIO::fastq page: http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq I read: "This object can transform Bio::Seq and Bio::Seq::Quality objects to and from fastq flat file databases." I'm not sure how to code the link between the fastq IO object and the qual object that I have created using the code from the previous thread... Any suggestions? What am I missing? 2009/4/22 Dan Bolser : > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > From dan.bolser at gmail.com Wed Apr 22 09:36:03 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:36:03 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <892884AD17FA42DA96BA586AEAE2170E@NewLife> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <892884AD17FA42DA96BA586AEAE2170E@NewLife> Message-ID: <2c8757af0904220636q6ad96152p63405e03bbe85e6f@mail.gmail.com> Cheers Mark - I was having difficulty understanding that module... I should read more and post less ;-) I got it figured out now... Here is my working code, based on the example kindly posted by Phillip San Miguel #!/usr/bin/perl -w use warnings; use strict; use Bio::SeqIO; use Bio::Seq::Quality; my ($seq_infile,$qual_infile) =(scalar @ARGV == 1) ?($ARGV[0] ,"$ARGV[0].qual") :@ARGV; #Create input objects for both a seq (fasta) and qual file my $in_seq_obj = Bio::SeqIO->new( -file => $seq_infile, -format => 'fasta', ); my $in_qual_obj = Bio::SeqIO->new( -file => $qual_infile, -format => 'qual', ); my $out_fastq_obj = Bio::SeqIO->new( -format => 'fastq' ); while (1){ ## create objects for both a seq and its associated qual my $seq_obj = $in_seq_obj->next_seq || last; my $qual_obj = $in_qual_obj->next_seq; #use seq and qual object methods feed info for new BSQ object my $bsq_obj = Bio::Seq::Quality->new( -seq => $seq_obj->seq(), -qual => $qual_obj->qual(), ); $out_fastq_obj->write_fastq($bsq_obj); exit; } 2009/4/22 Mark A. Jensen : > Dan- There is a fastq module under Bio::SeqIO. Do something like > > ? ? ? ? use Bio::Seq::Quality; > ? ? ? ? use Bio::SeqIO; > ? ? ? ? ? ? ? ?# from Bio::Seq::Quality synopsis... > ? ? ? ?my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; > ? ? ? ?my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; > > ? ? ? ?my $seq = Bio::Seq::Quality->new > ? ? ? ? ? ?( -qual => $qual, > ? ? ? ? ? ? ?-trace_indices => $trace, > ? ? ? ? ? ? ?-seq => ?'atcgatcgatcg', > ? ? ? ? ? ? ?-id ?=> 'human_id', > ? ? ? ? ? ? ?-accession_number => 'S000012', > ? ? ? ? ? ? ?-verbose => -1 ? # to silence deprecated methods > ? ? ? ?); > ? ? ? # typical Bio::SeqIO call > ? ? ? $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); > ? ? ? $seqio->write_seq($seq); > > Mark > ----- Original Message ----- From: "Dan Bolser" > To: > Sent: Wednesday, April 22, 2009 6:49 AM > Subject: [Bioperl-l] Creating a fastq format file? > > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From maj at fortinbras.us Wed Apr 22 09:33:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 22 Apr 2009 09:33:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <892884AD17FA42DA96BA586AEAE2170E@NewLife> Dan- There is a fastq module under Bio::SeqIO. Do something like use Bio::Seq::Quality; use Bio::SeqIO; # from Bio::Seq::Quality synopsis... my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; my $seq = Bio::Seq::Quality->new ( -qual => $qual, -trace_indices => $trace, -seq => 'atcgatcgatcg', -id => 'human_id', -accession_number => 'S000012', -verbose => -1 # to silence deprecated methods ); # typical Bio::SeqIO call $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); $seqio->write_seq($seq); Mark ----- Original Message ----- From: "Dan Bolser" To: Sent: Wednesday, April 22, 2009 6:49 AM Subject: [Bioperl-l] Creating a fastq format file? > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mmuratet at hudsonalpha.org Wed Apr 22 10:03:57 2009 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Wed, 22 Apr 2009 09:03:57 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> Message-ID: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > In the Bio::SeqIO::fastq page: > > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq > > > I read: > > "This object can transform Bio::Seq and Bio::Seq::Quality objects to > and from fastq flat file databases." > > I'm not sure how to code the link between the fastq IO object and the > qual object that I have created using the code from the previous > thread... > > Any suggestions? What am I missing? Howdy This might be a good place to ask the question: having looked at the fastq.pm page, is the fastq format defined (only) by a "@'" followed by a sequence line and a "+" header followed by a quality line and the two headers have to agree? Now that Illumina is using phred scaling, are 'Sanger' and 'Illumina' versions the same? Thanks Mike > > > > 2009/4/22 Dan Bolser : >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point >> me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Apr 22 09:38:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 22 Apr 2009 09:38:53 -0400 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: Hi Carlos, I am moving your inquiry to the BioPerl list, as the tool is a part of Bioperl-db and uses BioPerl for parsing the ontologies. In your case, the goflat parser in BioPerl seems to balk at the second one of the input files. It may be that the input file is (was?) corrupted, that does happen every once in a while. More likely though is that the goflat parser hasn't kept up with some format changes. Have you tried using the obo format version instead? -hilmar On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > Hi guys > > I'm working with biosql and I try to figure out how to load > ontologies into biosql. > > I've tried > > load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- > host localhost --dbname biosql --namespace "Gene Ontology" --format > goflat --fmtargs "-defs_file,GO.defs" function.ontology > process.ontology component.ontology > > as in the script info but I have an error, > > > ------------------- WARNING --------------------- > MSG: DBLink exists in the dblink of _default > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: format error (file process.ontology) offending line: > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > regulation of angiogenesis ; synonym:down\-regulation of > angiogenesis ; synonym:downregulation of angiogenesis ; > synonym:inhibition of angiogenesis % negative regulation of > developmental process ; GO:0051093 % regulation of angiogenesis ; GO: > 0045765 > > STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ > perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ > Bio/OntologyIO/dagflat.pm:284 > STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ > 5.10.0/Bio/OntologyIO/dagflat.pm:317 > STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ > load_ontology.pl:604 > ------------------------------------- > > Any suggestion? > > Cheers, > > Carlos > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Apr 22 10:50:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Apr 2009 09:50:47 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: On Apr 22, 2009, at 9:03 AM, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and > the two headers have to agree? Now that Illumina is using phred > scaling, are 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike I think that's how it is defined, but I remember a while ago finding a formal definition of the format was a bit difficult. Looks like that has been rectified: http://maq.sourceforge.net/fastq.shtml If the parser doesn't read Illumina FASTQ format feel free to post a bug report with some example data. I'm sure this will be needed functionality in the future (and it shouldn't be too hard to add in). chris From hans-rudolf.hotz at fmi.ch Wed Apr 22 10:58:21 2009 From: hans-rudolf.hotz at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 22 Apr 2009 16:58:21 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and the > two headers have to agree? Now that Illumina is using phred scaling, > are 'Sanger' and 'Illumina' versions the same? No, see: http://maq.sourceforge.net/fastq.shtml Regards, Hans > > Thanks > > Mike From j_martin at lbl.gov Wed Apr 22 11:58:15 2009 From: j_martin at lbl.gov (Joel Martin) Date: Wed, 22 Apr 2009 08:58:15 -0700 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: <20090422155815.GA14402@eniac.jgi-psf.org> On Wed, Apr 22, 2009 at 09:03:57AM -0500, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed by a > sequence line and a "+" header followed by a quality line and the two > headers have to agree? Now that Illumina is using phred scaling, are > 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike No they aren't the same, Illumina still encodes the ascii as value + 64 and Sanger as value + 33. Joel From j_martin at lbl.gov Thu Apr 23 05:32:08 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 23 Apr 2009 02:32:08 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Message-ID: <20090423093208.GB22615@eniac.jgi-psf.org> Hello, Maybe they put the headers back in the separate distribution, they seem to be there now. ls -l io_lib-1.11.6/io_lib/abi.h 4 -rw-r--r-- 1 me mypeeps 793 Dec 10 06:54 io_lib-1.11.6/io_lib/abi.h And I can get the ABI-tests to pass with the bioperl-ext on linux, though it takes some odd contortions of the Makefile to get it to compile here. [snip] # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 1 wallclock secs ( 0.95 cusr + 0.06 csys = 1.01 CPU) I might find time to take a shot at getting it to compile cleanerly for linux and solaris, unless you think that's pointless as the BioLib conversion might happen before summer? Joel On Wed, Apr 15, 2009 at 07:26:15AM -0500, Chris Fields wrote: > Joel, > > They haven't stopped supporting it. IIRC the separate io_lib distribution > no longer has the ABI headers, but the io_lib with the full staden package > does (a little confusing, yes). I have 1.11.6 and ABI-related tests for > bioperl and bioperl-ext don't pass, but compiling with an earlier version > does work. It may be as simple as including the header files from an old > version, but I haven't tried that. > > chris > > On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > >> Hello, >> Do you know where it says io_lib will stop supporting ABI? We use >> the latest ( 1.11.6 ) for this so I know it does read them and I just >> checked with one fresh off a sequencer. But I couldn't find an active >> forum for staden. >> >> Thanks, >> Joel >> >> On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >>> For ABI files you'll need an older version of io_lib that supports ABI or >>> the io_lib that comes with the full staden package. Recent versions of >>> io_lib don't have ABI support built-in anymore. >>> >>> chris >>> >>> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >>> >>>> Hi Mike. >>>> >>>> I did get that problem solved in the end, thanks to lots of help from >>>> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >>>> cc-ing the mailing list at some point. The last archived message in the >>>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >>>> the correct solution - the code change was incorporated into the >>>> bioperl-ext CVS, and is in the latest version that you can get from SVN >>>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >>>> problem you must be experiencing a different issue. >>>> >>>> You should also bear in mind the message Chris Fields sent to the list a >>>> few days ago, and have a look at using BioLib instead: >>>> >>>>> Just to note, we're not actively supporting much of the bioperl-ext >>>>> code, in favor of the BioLib initiative: >>>>> http://biolib.open-bio.org/wiki/Main_Page >>>>> If you do use bioperl-ext I suggest only using the latest code from >>>>> svn >>>>> (and that in combination with bioperl-live). >>>>> >>>>> chris >>>> >>>> Hope this helps. >>>> Roy. >>>> >>>> >>>> >>>> Michael Stubbington wrote: >>>>> Dear Dr. Chaudhuri, >>>>> I am currently trying to write a bioperl script that parses .abi >>>>> sequence >>>>> files. I am having exactly the same problem as you did when >>>>> you posted this enquiry to the bioperl mailing list >>>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>>> wondering if you ever solved the problem and, if so, can you remember >>>>> what you did? I?d be very grateful for any help you can provide. I >>>>> can?t find this problem mentioned anywhere else online. >>>>> Thank you for your time. >>>>> Mike >>>> >>>> -- >>>> Dr. Roy Chaudhuri >>>> Department of Veterinary Medicine >>>> University of Cambridge, U.K. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Apr 23 11:45:34 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Apr 2009 08:45:34 -0700 Subject: [Bioperl-l] Request concerning BioPerl In-Reply-To: <49F0300C.2060700@moldiag.de> References: <49F0300C.2060700@moldiag.de> Message-ID: Mato- Please ask on the mailing list - there is documention in the perldoc for starters and the rest depends on how you are querying for accessions or using Entrez queries. -jason On Apr 23, 2009, at 2:08 AM, Mato Nagel wrote: > Dear colleagues, > where are the options documented? > > $gb = Bio::DB::GenBank->new(@options) > > Sincerely Yours > Mato Nagel Jason Stajich jason at bioperl.org From dan.bolser at gmail.com Fri Apr 24 11:24:17 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 16:24:17 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? Message-ID: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Hi all, I couldn't find out how to get the 'clear range' from a Bio::Seq::Quality object... Am I looking in the wrong place, or should this method be a part of the Bio::Seq::Quality class? In the latter case I'm on my way to an implementation, but I am not good at navigating the bioperl docs, so I thought I should ask before I take the time to finish that off. Cheers, Dan. From dan.bolser at gmail.com Fri Apr 24 12:20:23 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 17:20:23 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Message-ID: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Its a bit rough and ready, but it does what I need... =head2 get_clear_range Title : get_clear_range Title : subqual Usage : $subobj = $obj->get_clear_range(); $subobj = $obj->get_clear_range(20); Function : Get the clear range using the given quality score as a cutoff or a default value of 13. Returns : a new Bio::Seq::Quality object Args : a minimum quality value, optional, devault = 13 =cut sub get_clear_range { my $self = shift; my $qual = $self->qual; my $minQual = shift || 13; my (@ranges, $rangeFlag); for(my $i=0; $i<@$qual; $i++){ ## Are we currently within a clear range or not? if(defined($rangeFlag)){ ## Did we just leave the clear range? if($qual->[$i]<$minQual){ ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; ## and reset the range flag. $rangeFlag = undef; } ## else nothing changes } else{ ## Did we just enter a clear range? if($qual->[$i]>=$minQual){ ## Better set the range flag! $rangeFlag = $i; } ## else nothing changes } } ## Did we exit the last clear range? if(defined($rangeFlag)){ my $i = scalar(@$qual); ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; } unless(@ranges){ die "There is no clear range... I don't know what to do here!\n"; } print "there are ", scalar(@ranges), " clear ranges\n"; my $sum; map {$sum += $_->[2]} @ranges; print "of ", scalar(@$qual), " bases, there are $sum with ". "quality scores above the given threshold\n"; for (sort {$b->[2] <=> $a->[2]} @ranges){ if($_->[2]/$sum < 0.5){ warn "not so much a clear range as a clear chunk...\n"; } print $_->[2], "\t", $_->[2]/$sum, "\n"; return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, $_->[1]+1), -qual => $self->subqual($_->[0]+1, $_->[1]+1) ); } } Note, for testing I made a package called Bio/Seq/QualityDB.pm (which is a copy of Bio/Seq/Quality.pm that just has the above method added). That is why the 'new Bio::Seq::Quality object' is actually a Bio::Seq::QualityDB object, but other than that it should slot right in (apart from all the debugging output that I spit out). Cheers, Dan. 2009/4/24 Dan Bolser : > Hi all, > > I couldn't find out how to get the 'clear range' from a > Bio::Seq::Quality object... Am I looking in the wrong place, or should > this method be a part of the Bio::Seq::Quality class? > > In the latter case I'm on my way to an implementation, but I am not > good at navigating the bioperl docs, so I thought I should ask before > I take the time to finish that off. > > > Cheers, > Dan. > From cjfields at illinois.edu Fri Apr 24 14:56:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 13:56:34 -0500 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Message-ID: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If possible, tests don't hurt either! chris On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > Its a bit rough and ready, but it does what I need... > > > > > =head2 get_clear_range > > Title : get_clear_range > > Title : subqual > Usage : $subobj = $obj->get_clear_range(); > $subobj = $obj->get_clear_range(20); > Function : Get the clear range using the given quality score as a > cutoff or a default value of 13. > > Returns : a new Bio::Seq::Quality object > Args : a minimum quality value, optional, devault = 13 > > =cut > > sub get_clear_range > { > my $self = shift; > my $qual = $self->qual; > my $minQual = shift || 13; > > my (@ranges, $rangeFlag); > > for(my $i=0; $i<@$qual; $i++){ > ## Are we currently within a clear range or not? > if(defined($rangeFlag)){ > ## Did we just leave the clear range? > if($qual->[$i]<$minQual){ > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > ## and reset the range flag. > $rangeFlag = undef; > } > ## else nothing changes > } > else{ > ## Did we just enter a clear range? > if($qual->[$i]>=$minQual){ > ## Better set the range flag! > $rangeFlag = $i; > } > ## else nothing changes > } > } > ## Did we exit the last clear range? > if(defined($rangeFlag)){ > my $i = scalar(@$qual); > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > } > > unless(@ranges){ > die "There is no clear range... I don't know what to do here!\n"; > } > > print "there are ", scalar(@ranges), " clear ranges\n"; > > my $sum; map {$sum += $_->[2]} @ranges; > > print "of ", scalar(@$qual), " bases, there are $sum with ". > "quality scores above the given threshold\n"; > > for (sort {$b->[2] <=> $a->[2]} @ranges){ > if($_->[2]/$sum < 0.5){ > warn "not so much a clear range as a clear chunk...\n"; > } > print $_->[2], "\t", $_->[2]/$sum, "\n"; > > return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, > $_->[1]+1), > -qual => $self->subqual($_->[0]+1, $_->[1]+1) > ); > } > } > > > > > Note, for testing I made a package called Bio/Seq/QualityDB.pm (which > is a copy of Bio/Seq/Quality.pm that just has the above method added). > That is why the 'new Bio::Seq::Quality object' is actually a > Bio::Seq::QualityDB object, but other than that it should slot right > in (apart from all the debugging output that I spit out). > > > Cheers, > Dan. > > > 2009/4/24 Dan Bolser : >> Hi all, >> >> I couldn't find out how to get the 'clear range' from a >> Bio::Seq::Quality object... Am I looking in the wrong place, or >> should >> this method be a part of the Bio::Seq::Quality class? >> >> In the latter case I'm on my way to an implementation, but I am not >> good at navigating the bioperl docs, so I thought I should ask before >> I take the time to finish that off. >> >> >> Cheers, >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri Apr 24 15:39:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 24 Apr 2009 12:39:53 -0700 Subject: [Bioperl-l] cvs server still up? Message-ID: <49F21589.6060707@cornell.edu> The old bioperl CVS repository is still up: cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl export -rHEAD bioperl-live I had an old script that was cvs exporting a copy of bioperl, and it has been fetching really old copies for a while now. Maybe somebody might want to deactivate that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Apr 24 16:29:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 15:29:22 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F21589.6060707@cornell.edu> References: <49F21589.6060707@cornell.edu> Message-ID: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Not sure what the plans were for the CVS server beyond having it available for all older bioperl releases (pre-1.6). Everything has been moved into the svn server though, so really the cvs server is redundant. Shutting it down might serve the purpose of alerting users to the fact that we no longer use it! Thinking some more about it: it might be present simply b/c other open- bio projects are still using cvs. I can't recall if biopython switched over or not... chris On Apr 24, 2009, at 2:39 PM, Robert Buels wrote: > The old bioperl CVS repository is still up: > cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl > export -rHEAD bioperl-live > > I had an old script that was cvs exporting a copy of bioperl, and it > has been fetching really old copies for a while now. > > Maybe somebody might want to deactivate that? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Fri Apr 24 17:03:27 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 24 Apr 2009 16:03:27 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Message-ID: <49F2291F.7020704@jays.net> Chris Fields wrote: > I can't recall if biopython switched over or not... http://github.com/biopython "Official git mirror of the Biopython CVS repository" Ponder, j From cjfields at illinois.edu Fri Apr 24 18:50:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 17:50:12 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F2291F.7020704@jays.net> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> <49F2291F.7020704@jays.net> Message-ID: <9AC3AF4D-E9FF-4593-A53A-B59438EC2BA4@illinois.edu> Which makes me wonder, is the CVS version actually updated with git commits (and vice versa) or is git the only thing being used? It is listed as a 'mirror', so I'm assuming they somehow sync to/from CVS (ugh). chris On Apr 24, 2009, at 4:03 PM, Jay Hannah wrote: > Chris Fields wrote: >> I can't recall if biopython switched over or not... > > http://github.com/biopython > "Official git mirror of the Biopython CVS repository" > > Ponder, > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Sun Apr 26 01:50:14 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 26 Apr 2009 15:50:14 +1000 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <20090422155815.GA14402@eniac.jgi-psf.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > > This might be a good place to ask the question: having looked at the > > fastq.pm page, is the fastq format defined (only) by a "@'" followed by > a > > sequence line and a "+" header followed by a quality line and the two > > headers have to agree? Now that Illumina is using phred scaling, are > > 'Sanger' and 'Illumina' versions the same? > > No they aren't the same, Illumina still encodes the ascii as value + 64 > and Sanger as value + 33. > Illumina have now CHANGED how they calculate the quality value however in the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, but now they produce Q >= 0 and it is unclear if they start at 69 or 64 now... I have tried to summarise this in a central place: http://en.wikipedia.org/wiki/FASTQ_format Corrections welcome! --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From heikki.lehvaslaiho at gmail.com Mon Apr 27 01:42:03 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 07:42:03 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > I have tried to summarise this in a central place: > http://en.wikipedia.org/wiki/FASTQ_format Torsten, Thanks for putting this together. Very helpful. Do you have a plan of action? Let me propose one for BioPerl. It based on following assumptions: 1. There is multitude of different ways of coding quality values out there. 2. Bio::Seq::Quality is agnostic of any quality value range rules 3. The emerging open standard is the Sanger fastq specification 4. Open source programs use the Sanger fastq specs >From these it follows that: 1. BioPerl should support Sanger fastq standard 1.1. it already does and there are other SeqIO modules for dealing with other non-fastq formats. 2. BioPerl should offer simple ways of converting between quality range rules 2.1. Have a generic method accessible from Bio::Seq::Quality with preset versions of the method for converting between known variants (Sanger fastq and the two Illumina versions) For example: range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) throw if $value < $from_lower or $value > $from_upper return $newvalue range_convert_illumina2fastq(), range_convert_fastq2illumina(), range_convert_fastq2phred(), range_convert_phred2fastq().... (assuming that illumina 1.3 eq phred) 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina qualities into Sanger fastq on the fly 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of quality value range either automatically or be given a keyword parameter indicating the range. 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects a quality value out of range. 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it detects a quality value out of range. 2.2.4. It would be useful but not absolutely necessary for Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina ranges What do you think? -Heikki 2009/4/26 Torsten Seemann : >> > This might be a good place to ask the question: having looked at the >> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >> a >> > sequence line and a "+" header followed by a quality line and the two >> > headers have to agree? Now that Illumina is using phred scaling, are >> > 'Sanger' and 'Illumina' versions the same? >> >> No they aren't the same, Illumina still encodes the ascii as value + 64 >> and Sanger as value + 33. >> > > Illumina have now CHANGED how they calculate the quality value however in > the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, > but now they produce Q >= 0 and it is unclear if they start at 69 or 64 > now... > > I have tried to summarise this in a central place: > > http://en.wikipedia.org/wiki/FASTQ_format > > Corrections welcome! > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 02:42:08 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 08:42:08 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: Dan, It looks like your method does two different things: 1. Returns the longest subsequence above the threshold 2. Analyses the the sequence for the number of ranges the current threshold creates. Why not separate these functions? Lets add a method that sets the threshold and stores it internally as $self->_threshold. Setting it to a new values should trigger emptying all the caches (see below.) Lets have two more public methods: 1. get_clean_range() - optional argument 'threshold' It returns the longest clean subseq. 2. count_clean_ranges() -again optional argument 'threshold' This returns the number of ranges detected. Both methods call first the public method threshold if the argument has been given and then an internal method _find_clean_ranges(). That method calculates all the ranges and stores them internally (as $self->_clean_ranges-> [...]). The number of ranges is also stored (e.g. $self->_number_of ranges).These internal values form the cache that needs to be emptied whenever any of the critical values of the object changes: threshold, quality or seq. Create an internal method $self->_clear_cache, that does that. Now the quality new object does not get created until you call get_clean_range() which accesses the cached values (or creates them if they are not there). This design allows you to have no extra penalty for adding more methods that act on cached values. For example, it might be sensible thing to do at some point to look at all the ranges that are longer than some length. Then you could write in your program: $qual->threshold(10); if ($qual->count_clean_ranges = 1) { my $newqual = $qual->get_clean_range() # do your analysis } elsif ($qual->count_clean_ranges = 0) { # do some reporting and logging } else { # more than one ranges my @quals = $qual->get_all_clean_ranges($min_lenght); # do some more work and possibly select the best one(s) } Yours, -Heikki 2009/4/24 Chris Fields : > You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If > possible, tests don't hurt either! > > chris > > On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > >> Its a bit rough and ready, but it does what I need... >> >> >> >> >> =head2 get_clear_range >> >> Title ? ?: get_clear_range >> >> Title ? ?: subqual >> Usage ? ?: $subobj = $obj->get_clear_range(); >> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >> Function : Get the clear range using the given quality score as a >> ? ? ? ? ? cutoff or a default value of 13. >> >> Returns ?: a new Bio::Seq::Quality object >> Args ? ? : a minimum quality value, optional, devault = 13 >> >> =cut >> >> sub get_clear_range >> { >> ? my $self = shift; >> ? my $qual = $self->qual; >> ? my $minQual = shift || 13; >> >> ? my (@ranges, $rangeFlag); >> >> ? for(my $i=0; $i<@$qual; $i++){ >> ? ? ? ?## Are we currently within a clear range or not? >> ? ? ? ?if(defined($rangeFlag)){ >> ? ? ? ? ? ?## Did we just leave the clear range? >> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >> ? ? ? ? ? ? ? ?## Log the range >> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? ? ? ? ? ? ? ?## and reset the range flag. >> ? ? ? ? ? ? ? ?$rangeFlag = undef; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? ? ? ?else{ >> ? ? ? ? ? ?## Did we just enter a clear range? >> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >> ? ? ? ? ? ? ? ?## Better set the range flag! >> ? ? ? ? ? ? ? ?$rangeFlag = $i; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? } >> ? ## Did we exit the last clear range? >> ? if(defined($rangeFlag)){ >> ? ? ? ?my $i = scalar(@$qual); >> ? ? ? ?## Log the range >> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? } >> >> ? unless(@ranges){ >> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >> ? } >> >> ? print "there are ", scalar(@ranges), " clear ranges\n"; >> >> ? my $sum; map {$sum += $_->[2]} @ranges; >> >> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >> ? ? ? ?"quality scores above the given threshold\n"; >> >> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >> ? ? ? ?if($_->[2]/$sum < 0.5){ >> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >> ? ? ? ?} >> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >> >> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >> $_->[1]+1), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >> $_->[1]+1) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >> ? } >> } >> >> >> >> >> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >> is a copy of Bio/Seq/Quality.pm that just has the above method added). >> That is why the 'new Bio::Seq::Quality object' is actually a >> Bio::Seq::QualityDB object, but other than that it should slot right >> in (apart from all the debugging output that I spit out). >> >> >> Cheers, >> Dan. >> >> >> 2009/4/24 Dan Bolser : >>> >>> Hi all, >>> >>> I couldn't find out how to get the 'clear range' from a >>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>> this method be a part of the Bio::Seq::Quality class? >>> >>> In the latter case I'm on my way to an implementation, but I am not >>> good at navigating the bioperl docs, so I thought I should ask before >>> I take the time to finish that off. >>> >>> >>> Cheers, >>> Dan. >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From dan.bolser at gmail.com Mon Apr 27 04:31:39 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 27 Apr 2009 09:31:39 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method ?_find_clean_ranges(). That > method calculates all the ranges and stores them internally ?(as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form ?the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do ?at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > ?my $newqual = $qual->get_clean_range() > ?# do your analysis > } elsif ($qual->count_clean_ranges = 0) { > ? # do some reporting and logging > } else { ?# more than one ranges > ? my @quals = $qual->get_all_clean_ranges($min_lenght); > ? # do some more work and possibly select the best one(s) > } > > > > Yours, > > ? -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title ? ?: get_clear_range >>> >>> Title ? ?: subqual >>> Usage ? ?: $subobj = $obj->get_clear_range(); >>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> ? ? ? ? ? cutoff or a default value of 13. >>> >>> Returns ?: a new Bio::Seq::Quality object >>> Args ? ? : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> ? my $self = shift; >>> ? my $qual = $self->qual; >>> ? my $minQual = shift || 13; >>> >>> ? my (@ranges, $rangeFlag); >>> >>> ? for(my $i=0; $i<@$qual; $i++){ >>> ? ? ? ?## Are we currently within a clear range or not? >>> ? ? ? ?if(defined($rangeFlag)){ >>> ? ? ? ? ? ?## Did we just leave the clear range? >>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>> ? ? ? ? ? ? ? ?## Log the range >>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? ? ? ? ? ? ? ?## and reset the range flag. >>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? ? ? ?else{ >>> ? ? ? ? ? ?## Did we just enter a clear range? >>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>> ? ? ? ? ? ? ? ?## Better set the range flag! >>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? } >>> ? ## Did we exit the last clear range? >>> ? if(defined($rangeFlag)){ >>> ? ? ? ?my $i = scalar(@$qual); >>> ? ? ? ?## Log the range >>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? } >>> >>> ? unless(@ranges){ >>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>> ? } >>> >>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> ? my $sum; map {$sum += $_->[2]} @ranges; >>> >>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>> ? ? ? ?"quality scores above the given threshold\n"; >>> >>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>> ? ? ? ?} >>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>> $_->[1]+1), >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>> ? } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:38:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:38:40 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: I convinced at least myself to the degree that I wrote the range_convert() method - with plenty of tests. I mention this now so that no-one else need to start thinking through all the edge values. :) I'll contribute it to the code base once there is a consensus of best way forward. -Heikki 2009/4/27 Heikki Lehvaslaiho : >> I have tried to summarise this in a central place: >> http://en.wikipedia.org/wiki/FASTQ_format > > Torsten, > > Thanks for putting this together. Very helpful. > > Do you have a plan of action? ?Let me propose one for BioPerl. It > based on following assumptions: > > 1. There is multitude of different ways of coding quality values out there. > 2. Bio::Seq::Quality is agnostic of any quality value range rules > 3. The emerging open standard is the Sanger fastq specification > 4. Open source programs use the Sanger fastq specs > > > From these it follows that: > > > 1. BioPerl should support Sanger fastq standard > > 1.1. it already does and there are other SeqIO modules for dealing > with other non-fastq formats. > > 2. BioPerl should offer simple ways of converting between quality range rules > > 2.1. Have a generic method accessible from Bio::Seq::Quality with > preset versions of the method for converting between known variants > (Sanger fastq and the two Illumina versions) > > For example: > > range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) > ?throw if $value < $from_lower or $value > $from_upper > ?return $newvalue > > range_convert_illumina2fastq(), range_convert_fastq2illumina(), > range_convert_fastq2phred(), ?range_convert_phred2fastq().... > > (assuming that illumina 1.3 eq phred) > > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina > qualities into Sanger fastq on the fly > > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of > quality value range either automatically or be given a keyword > parameter indicating the range. > > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects > a quality value out of range. > > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it > detects a quality value out of range. > > 2.2.4. It would be useful but not absolutely necessary for > Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina > ranges > > > What do you think? > > ? ?-Heikki > > 2009/4/26 Torsten Seemann : >>> > This might be a good place to ask the question: having looked at the >>> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >>> a >>> > sequence line and a "+" header followed by a quality line and the two >>> > headers have to agree? Now that Illumina is using phred scaling, are >>> > 'Sanger' and 'Illumina' versions the same? >>> >>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>> and Sanger as value + 33. >>> >> >> Illumina have now CHANGED how they calculate the quality value however in >> the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, >> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >> now... >> >> I have tried to summarise this in a central place: >> >> http://en.wikipedia.org/wiki/FASTQ_format >> >> Corrections welcome! >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:41:52 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:41:52 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, I'll take your code and put it into bioperl-live rewritten the way I suggested and add few tests. That should get you started, -Heikki 2009/4/27 Dan Bolser : > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method ?_find_clean_ranges(). That >> method calculates all the ranges and stores them internally ?(as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form ?the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do ?at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> ?my $newqual = $qual->get_clean_range() >> ?# do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> ? # do some reporting and logging >> } else { ?# more than one ranges >> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >> ? # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> ? -Heikki >> >> 2009/4/24 Chris Fields : >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title ? ?: get_clear_range >>>> >>>> Title ? ?: subqual >>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> ? ? ? ? ? cutoff or a default value of 13. >>>> >>>> Returns ?: a new Bio::Seq::Quality object >>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> ? my $self = shift; >>>> ? my $qual = $self->qual; >>>> ? my $minQual = shift || 13; >>>> >>>> ? my (@ranges, $rangeFlag); >>>> >>>> ? for(my $i=0; $i<@$qual; $i++){ >>>> ? ? ? ?## Are we currently within a clear range or not? >>>> ? ? ? ?if(defined($rangeFlag)){ >>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>> ? ? ? ? ? ? ? ?## Log the range >>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? ? ? ?else{ >>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? } >>>> ? ## Did we exit the last clear range? >>>> ? if(defined($rangeFlag)){ >>>> ? ? ? ?my $i = scalar(@$qual); >>>> ? ? ? ?## Log the range >>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? } >>>> >>>> ? unless(@ranges){ >>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>> ? } >>>> >>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> ? ? ? ?"quality scores above the given threshold\n"; >>>> >>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>> ? ? ? ?} >>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>> $_->[1]+1), >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>> ? } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 09:10:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 08:10:04 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: This is going within Bio::Seq::Quality, correct? Does Bio::Seq::Quality have a method that indicates what format the quality scores are actually in (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about this is quality scores appear inseparable from their quality format (ranges vary in length, for instance). For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell which quality format it originated from w/o guessing, and similarly could I accurately convert it to another qual format? To me it seems we need something in Bio::Seq::Quality akin to the alphabet() method used for sequence data. chris On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > I convinced at least myself to the degree that I wrote the > range_convert() method - with plenty of tests. I mention this now so > that no-one else need to start thinking through all the edge values. > :) > > I'll contribute it to the code base once there is a consensus of best > way forward. > > -Heikki > > 2009/4/27 Heikki Lehvaslaiho : >>> I have tried to summarise this in a central place: >>> http://en.wikipedia.org/wiki/FASTQ_format >> >> Torsten, >> >> Thanks for putting this together. Very helpful. >> >> Do you have a plan of action? Let me propose one for BioPerl. It >> based on following assumptions: >> >> 1. There is multitude of different ways of coding quality values >> out there. >> 2. Bio::Seq::Quality is agnostic of any quality value range rules >> 3. The emerging open standard is the Sanger fastq specification >> 4. Open source programs use the Sanger fastq specs >> >> >> From these it follows that: >> >> >> 1. BioPerl should support Sanger fastq standard >> >> 1.1. it already does and there are other SeqIO modules for dealing >> with other non-fastq formats. >> >> 2. BioPerl should offer simple ways of converting between quality >> range rules >> >> 2.1. Have a generic method accessible from Bio::Seq::Quality with >> preset versions of the method for converting between known variants >> (Sanger fastq and the two Illumina versions) >> >> For example: >> >> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, >> $value) >> throw if $value < $from_lower or $value > $from_upper >> return $newvalue >> >> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >> range_convert_fastq2phred(), range_convert_phred2fastq().... >> >> (assuming that illumina 1.3 eq phred) >> >> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >> qualities into Sanger fastq on the fly >> >> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream >> of >> quality value range either automatically or be given a keyword >> parameter indicating the range. >> >> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it >> detects >> a quality value out of range. >> >> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >> detects a quality value out of range. >> >> 2.2.4. It would be useful but not absolutely necessary for >> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >> ranges >> >> >> What do you think? >> >> -Heikki >> >> 2009/4/26 Torsten Seemann : >>>>> This might be a good place to ask the question: having looked at >>>>> the >>>>> fastq.pm page, is the fastq format defined (only) by a "@'" >>>>> followed by >>>> a >>>>> sequence line and a "+" header followed by a quality line and >>>>> the two >>>>> headers have to agree? Now that Illumina is using phred scaling, >>>>> are >>>>> 'Sanger' and 'Illumina' versions the same? >>>> >>>> No they aren't the same, Illumina still encodes the ascii as >>>> value + 64 >>>> and Sanger as value + 33. >>>> >>> >>> Illumina have now CHANGED how they calculate the quality value >>> however in >>> the last month or so... Their Q range used to be -5..40 mapped to >>> ASCII 64+, >>> but now they produce Q >= 0 and it is unclear if they start at 69 >>> or 64 >>> now... >>> >>> I have tried to summarise this in a central place: >>> >>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Corrections welcome! >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From markus.liebscher at gmx.de Mon Apr 27 09:51:09 2009 From: markus.liebscher at gmx.de (manni122) Date: Mon, 27 Apr 2009 06:51:09 -0700 (PDT) Subject: [Bioperl-l] Re moteblast using Swissprot Message-ID: <23256705.post@talk.nabble.com> Hi, I want to retrieve the sequence identifier from the remoteblast interface (Bio::Tools::Run::RemoteBlast). With this ID I want to look up annotations stored in the Bio::DB::SwissProt. I am using the example code from the RemoteBlast documentation. If I am using a known sequence as input I get "Can't call method "next_hsp" on an undefined value "? This happens only with swissprot as database - the nr database works fine. The accession code from nr is not accepted from the Bio::DB::SwissProt. Is there something wrong with the database? Here is the code I am using: my $v = 1; my @params = ('-prog' => 'blastp', '-data' => 'nr', '-expect' => '1e-10' ); #swissprot is not working $Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM62'; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); $v = 1; my $r = $factory->submit_blast($proteinaa); print STDERR "Need BLAST Analysis, waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { $factory->remove_rid($rid); $result = $rc->next_result; $hit = $result->next_hit; $hsp = $hit->next_hsp; $idneu = $hit->accession; } } } -- View this message in context: http://www.nabble.com/Remoteblast-using-Swissprot-tp23256705p23256705.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From heikki.lehvaslaiho at gmail.com Mon Apr 27 11:44:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:44:40 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. Test and extend, -Heikki 2009/4/27 Heikki Lehvaslaiho : > Dan, > > I'll take your code and put it into bioperl-live rewritten the way I > suggested and add few tests. > > That should get you started, > > ? -Heikki > > 2009/4/27 Dan Bolser : >> Hi Heikki, >> >> Thanks very much for the advice on how to better implement the clear >> range method within the Bio::Seq::Quality object. I can understand the >> logic of what you have written, and it all sounds reasonable. The only >> problem is that I am very inexperienced with working on object >> oriented Perl (my 'one man' projects to date have never really >> required me to think beyond scripts, and its been years since I >> actually tried to code objects in Perl). >> >> To be specific, when you say, "Lets add a method that sets the >> threshold and stores it internally as $self->_threshold", ignoring any >> other functionality, what would that method look like? in particular, >> how would $self->_threshold be implemented? >> >> I think once I see that detail, I can go ahead and try to code what >> you suggested. >> >> >> Similarly (Chris), where would I put the tests / how would they be implemented? >> >> >> Thanks again for the feedback. >> >> All the best, >> Dan. >> >> >> >> 2009/4/27 Heikki Lehvaslaiho : >>> Dan, >>> >>> It looks like your method does two different things: >>> >>> 1. Returns the longest subsequence above the threshold >>> 2. Analyses the the sequence for the number of ranges the current >>> threshold creates. >>> >>> Why not separate these functions? >>> >>> Lets add a method that sets the threshold and stores it internally as >>> $self->_threshold. Setting it to a new values should trigger emptying >>> all the caches (see below.) >>> >>> Lets have two more public methods: >>> >>> 1. get_clean_range() - optional argument 'threshold' >>> >>> It returns the longest clean subseq. >>> >>> 2. count_clean_ranges() -again optional argument 'threshold' >>> >>> This returns the number of ranges detected. >>> >>> Both methods call first the public method threshold if the argument >>> has been given and then an internal method ?_find_clean_ranges(). That >>> method calculates all the ranges and stores them internally ?(as >>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>> that needs to be emptied whenever any of the critical values of the >>> object changes: threshold, quality or seq. Create an internal method >>> $self->_clear_cache, that does that. >>> >>> Now the quality new object does not get created until you call >>> get_clean_range() which accesses the cached values (or creates them if >>> they are not there). >>> >>> This design allows you to have no extra penalty for adding more >>> methods that act on cached values. For example, it might be sensible >>> thing to do ?at some point to look at all the ranges that are longer >>> than some length. Then you could write in your program: >>> >>> >>> $qual->threshold(10); >>> if ($qual->count_clean_ranges = 1) { >>> ?my $newqual = $qual->get_clean_range() >>> ?# do your analysis >>> } elsif ($qual->count_clean_ranges = 0) { >>> ? # do some reporting and logging >>> } else { ?# more than one ranges >>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>> ? # do some more work and possibly select the best one(s) >>> } >>> >>> >>> >>> Yours, >>> >>> ? -Heikki >>> >>> 2009/4/24 Chris Fields : >>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>> possible, tests don't hurt either! >>>> >>>> chris >>>> >>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>> >>>>> Its a bit rough and ready, but it does what I need... >>>>> >>>>> >>>>> >>>>> >>>>> =head2 get_clear_range >>>>> >>>>> Title ? ?: get_clear_range >>>>> >>>>> Title ? ?: subqual >>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>> Function : Get the clear range using the given quality score as a >>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>> >>>>> Returns ?: a new Bio::Seq::Quality object >>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>> >>>>> =cut >>>>> >>>>> sub get_clear_range >>>>> { >>>>> ? my $self = shift; >>>>> ? my $qual = $self->qual; >>>>> ? my $minQual = shift || 13; >>>>> >>>>> ? my (@ranges, $rangeFlag); >>>>> >>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>> ? ? ? ? ? ? ? ?## Log the range >>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? ? ? ?else{ >>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? } >>>>> ? ## Did we exit the last clear range? >>>>> ? if(defined($rangeFlag)){ >>>>> ? ? ? ?my $i = scalar(@$qual); >>>>> ? ? ? ?## Log the range >>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? } >>>>> >>>>> ? unless(@ranges){ >>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>> ? } >>>>> >>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>> >>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>> >>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>> >>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>> ? ? ? ?} >>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>> >>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>> $_->[1]+1), >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>> $_->[1]+1) >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>> ? } >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>> in (apart from all the debugging output that I spit out). >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>>> >>>>> 2009/4/24 Dan Bolser : >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I couldn't find out how to get the 'clear range' from a >>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>> >>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>> I take the time to finish that off. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? ?-Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 11:53:12 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:53:12 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: 2009/4/27 Chris Fields : > This is going within Bio::Seq::Quality, correct? Yes. Does Bio::Seq::Quality > have a method that indicates what format the quality scores are actually in > (sanger/illumina/illumina1.3/phred/foo)? ?The reason I worry about this is > quality scores appear inseparable from their quality format (ranges vary in > length, for instance). No method. > For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell > which quality format it originated from w/o guessing, and similarly could I > accurately convert it to another qual format? ?To me it seems we need > something in Bio::Seq::Quality akin to the alphabet() method used for > sequence data. The text formats encode the quality values in different ways, but they are all stored as integer arrays in the object. Converting between them is relatively easy. You are right: quality_format() or even plain format() is needed. The SeqIO methods creating the objects should be setting it. Warnings for unset format values should be added to appropriate places. -Heikki > chris > > On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > >> I convinced at least myself to the degree that I wrote the >> range_convert() method - with plenty of tests. I mention this now so >> that no-one else need to start thinking through all the edge values. >> :) >> >> I'll contribute it to the code base once there is a consensus of best >> way forward. >> >> ? -Heikki >> >> 2009/4/27 Heikki Lehvaslaiho : >>>> >>>> I have tried to summarise this in a central place: >>>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Torsten, >>> >>> Thanks for putting this together. Very helpful. >>> >>> Do you have a plan of action? ?Let me propose one for BioPerl. It >>> based on following assumptions: >>> >>> 1. There is multitude of different ways of coding quality values out >>> there. >>> 2. Bio::Seq::Quality is agnostic of any quality value range rules >>> 3. The emerging open standard is the Sanger fastq specification >>> 4. Open source programs use the Sanger fastq specs >>> >>> >>> From these it follows that: >>> >>> >>> 1. BioPerl should support Sanger fastq standard >>> >>> 1.1. it already does and there are other SeqIO modules for dealing >>> with other non-fastq formats. >>> >>> 2. BioPerl should offer simple ways of converting between quality range >>> rules >>> >>> 2.1. Have a generic method accessible from Bio::Seq::Quality with >>> preset versions of the method for converting between known variants >>> (Sanger fastq and the two Illumina versions) >>> >>> For example: >>> >>> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) >>> ?throw if $value < $from_lower or $value > $from_upper >>> ?return $newvalue >>> >>> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >>> range_convert_fastq2phred(), ?range_convert_phred2fastq().... >>> >>> (assuming that illumina 1.3 eq phred) >>> >>> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >>> qualities into Sanger fastq on the fly >>> >>> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of >>> quality value range either automatically or be given a keyword >>> parameter indicating the range. >>> >>> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects >>> a quality value out of range. >>> >>> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >>> detects a quality value out of range. >>> >>> 2.2.4. It would be useful but not absolutely necessary for >>> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >>> ranges >>> >>> >>> What do you think? >>> >>> ? -Heikki >>> >>> 2009/4/26 Torsten Seemann : >>>>>> >>>>>> This might be a good place to ask the question: having looked at the >>>>>> fastq.pm page, is the fastq format defined (only) by a "@'" followed >>>>>> by >>>>> >>>>> a >>>>>> >>>>>> sequence line and a "+" header followed by a quality line and the two >>>>>> headers have to agree? Now that Illumina is using phred scaling, are >>>>>> 'Sanger' and 'Illumina' versions the same? >>>>> >>>>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>>>> and Sanger as value + 33. >>>>> >>>> >>>> Illumina have now CHANGED how they calculate the quality value however >>>> in >>>> the last month or so... Their Q range used to be -5..40 mapped to ASCII >>>> 64+, >>>> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >>>> now... >>>> >>>> I have tried to summarise this in a central place: >>>> >>>> http://en.wikipedia.org/wiki/FASTQ_format >>>> >>>> Corrections welcome! >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> >> >> >> -- >> ? -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 12:11:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 11:11:12 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: On Apr 27, 2009, at 10:53 AM, Heikki Lehvaslaiho wrote: > 2009/4/27 Chris Fields : >> This is going within Bio::Seq::Quality, correct? > > Yes. > > Does Bio::Seq::Quality >> have a method that indicates what format the quality scores are >> actually in >> (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about >> this is >> quality scores appear inseparable from their quality format (ranges >> vary in >> length, for instance). > > No method. > >> For instance, if I picked a Bio::Seq::Quality out of the blue, >> could I tell >> which quality format it originated from w/o guessing, and similarly >> could I >> accurately convert it to another qual format? To me it seems we need >> something in Bio::Seq::Quality akin to the alphabet() method used for >> sequence data. > > The text formats encode the quality values in different ways, but they > are all stored as integer arrays in the object. Converting between > them is relatively easy. > > You are right: quality_format() or even plain format() is needed. The > SeqIO methods creating the objects should be setting it. Warnings for > unset format values should be added to appropriate places. > > -Heikki Agreed, and any conversion methods could default to using a set quality_format()/format() for conversions to/from ascii (might serve as a good verification point as well). chris From maj at fortinbras.us Mon Apr 27 11:51:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Apr 2009 11:51:39 -0400 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com><2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com><90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan - congrats on your first contribution! Mark ----- Original Message ----- From: "Dan Bolser" To: "Heikki Lehvaslaiho" Cc: "Chris Fields" ; Sent: Monday, April 27, 2009 4:31 AM Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method _find_clean_ranges(). That > method calculates all the ranges and stores them internally (as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > my $newqual = $qual->get_clean_range() > # do your analysis > } elsif ($qual->count_clean_ranges = 0) { > # do some reporting and logging > } else { # more than one ranges > my @quals = $qual->get_all_clean_ranges($min_lenght); > # do some more work and possibly select the best one(s) > } > > > > Yours, > > -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title : get_clear_range >>> >>> Title : subqual >>> Usage : $subobj = $obj->get_clear_range(); >>> $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> cutoff or a default value of 13. >>> >>> Returns : a new Bio::Seq::Quality object >>> Args : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> my $self = shift; >>> my $qual = $self->qual; >>> my $minQual = shift || 13; >>> >>> my (@ranges, $rangeFlag); >>> >>> for(my $i=0; $i<@$qual; $i++){ >>> ## Are we currently within a clear range or not? >>> if(defined($rangeFlag)){ >>> ## Did we just leave the clear range? >>> if($qual->[$i]<$minQual){ >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ## and reset the range flag. >>> $rangeFlag = undef; >>> } >>> ## else nothing changes >>> } >>> else{ >>> ## Did we just enter a clear range? >>> if($qual->[$i]>=$minQual){ >>> ## Better set the range flag! >>> $rangeFlag = $i; >>> } >>> ## else nothing changes >>> } >>> } >>> ## Did we exit the last clear range? >>> if(defined($rangeFlag)){ >>> my $i = scalar(@$qual); >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> } >>> >>> unless(@ranges){ >>> die "There is no clear range... I don't know what to do here!\n"; >>> } >>> >>> print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> my $sum; map {$sum += $_->[2]} @ranges; >>> >>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>> "quality scores above the given threshold\n"; >>> >>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> if($_->[2]/$sum < 0.5){ >>> warn "not so much a clear range as a clear chunk...\n"; >>> } >>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>> $_->[1]+1), >>> -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ); >>> } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kaboroev at sfu.ca Mon Apr 27 15:04:05 2009 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Mon, 27 Apr 2009 12:04:05 -0700 Subject: [Bioperl-l] Bio::Graphics Sub Feature Title Message-ID: <49F601A5.8090205@sfu.ca> Hi, I was wondering if it is possible to set a different "-title" for each of the subfeatures in a track the same way one can set a different "-bgcolor" using a subroutine. I noticed that the -title subroutine is only called once per Feature and is passed a "Bio::SeqFeature::Generic" class whereas the -bgcolor subroutine is called once per Sub Feature and is passed the "Bio::SeqFeature::Generic"s which I created. Is there any way for the -title subroutine to be called each Sub Feature or is this not implemented? Keith From dan.bolser at gmail.com Tue Apr 28 01:46:05 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 28 Apr 2009 06:46:05 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0904272246q56e19a2dr542b29f2378d0a48@mail.gmail.com> 2009/4/27 Mark A. Jensen : > Dan - congrats on your first contribution! Mark I don't really feel like I can take much credit! Thanks Heikki! I'll look at what you did and see what I can add. Its a really good feeling to contribute to BioPerl (even if I didn't really do much!)... Now... where do I collect my cheque? ;-) Seriously though, thanks all for helping to put this together, and thanks for maintaining BioPerl and keeping it relevant as the field changes. All the best, Dan. > ----- Original Message ----- From: "Dan Bolser" > To: "Heikki Lehvaslaiho" > Cc: "Chris Fields" ; > Sent: Monday, April 27, 2009 4:31 AM > Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? > > > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be > implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method _find_clean_ranges(). That >> method calculates all the ranges and stores them internally (as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> my $newqual = $qual->get_clean_range() >> # do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> # do some reporting and logging >> } else { # more than one ranges >> my @quals = $qual->get_all_clean_ranges($min_lenght); >> # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> -Heikki >> >> 2009/4/24 Chris Fields : >>> >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title : get_clear_range >>>> >>>> Title : subqual >>>> Usage : $subobj = $obj->get_clear_range(); >>>> $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> cutoff or a default value of 13. >>>> >>>> Returns : a new Bio::Seq::Quality object >>>> Args : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> my $self = shift; >>>> my $qual = $self->qual; >>>> my $minQual = shift || 13; >>>> >>>> my (@ranges, $rangeFlag); >>>> >>>> for(my $i=0; $i<@$qual; $i++){ >>>> ## Are we currently within a clear range or not? >>>> if(defined($rangeFlag)){ >>>> ## Did we just leave the clear range? >>>> if($qual->[$i]<$minQual){ >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ## and reset the range flag. >>>> $rangeFlag = undef; >>>> } >>>> ## else nothing changes >>>> } >>>> else{ >>>> ## Did we just enter a clear range? >>>> if($qual->[$i]>=$minQual){ >>>> ## Better set the range flag! >>>> $rangeFlag = $i; >>>> } >>>> ## else nothing changes >>>> } >>>> } >>>> ## Did we exit the last clear range? >>>> if(defined($rangeFlag)){ >>>> my $i = scalar(@$qual); >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> } >>>> >>>> unless(@ranges){ >>>> die "There is no clear range... I don't know what to do here!\n"; >>>> } >>>> >>>> print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> "quality scores above the given threshold\n"; >>>> >>>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> if($_->[2]/$sum < 0.5){ >>>> warn "not so much a clear range as a clear chunk...\n"; >>>> } >>>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>>> $_->[1]+1), >>>> -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ); >>>> } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From brianli.cas at gmail.com Tue Apr 28 23:14:23 2009 From: brianli.cas at gmail.com (brian li) Date: Wed, 29 Apr 2009 11:14:23 +0800 Subject: [Bioperl-l] Parse problem of a big EMBL entry Message-ID: Hi everyone, Here is greeting from Brian. I have just began to use bioperl 1.6.0 to collect certain data lines from EMBL files. There's a problem when I try to get an entry that includes over 1 million lines. A call of Bio::SeqIO::embl->next_seq would just cause the parser script to exit. I have read Bio/SeqIO/embl.pm and I think one possible way to solve the problem may be to give my script more memory to store the entry data. The machine I am using has 32GB memory, and that shall be enough for any entry. So I am wondering whether there is any way to set the size of the memory available to a perl script. Others ways to deal with the problem are also welcome. Appreciate your help. Brian From jason at bioperl.org Wed Apr 29 01:10:27 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 28 Apr 2009 22:10:27 -0700 Subject: [Bioperl-l] Parse problem of a big EMBL entry In-Reply-To: References: Message-ID: <2154C145-1A66-4EEB-B99E-FBE8215539F5@bioperl.org> Brian - Without memory leaks it should only take up as much memory as the current sequence you have parsed. If you mean you have a sequence record with > 1M lines I'm not sure how much memory that would take up, depends on if this is lots of feature or what. There are ways to tell BioPerl to throw away things you don't want to parse out from the record. See http://bioperl.org/wiki/HOWTO:SeqIO#Speed. 2C_Bio::Seq::SeqBuilder Perl will use as much memory as is available on your machine. Have you monitored the memory use of the perl running to insure it is reaching the 32Gb limit and that is in fact what is killing the program? -jason On Apr 28, 2009, at 8:14 PM, brian li wrote: > Hi everyone, > > Here is greeting from Brian. > > I have just began to use bioperl 1.6.0 to collect certain data > lines from EMBL files. > > There's a problem when I try to get an entry that includes over 1 > million lines. A call of Bio::SeqIO::embl->next_seq would just cause > the parser script to exit. I have read Bio/SeqIO/embl.pm and I think > one possible way to solve the problem may be to give my script more > memory to store the entry data. The machine I am using has 32GB > memory, and that shall be enough for any entry. > > So I am wondering whether there is any way to set the size of the > memory available to a perl script. Others ways to deal with the > problem are also welcome. > > Appreciate your help. > > Brian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From paola.bisignano at gmail.com Wed Apr 29 10:08:57 2009 From: paola.bisignano at gmail.com (Paola Bisignano) Date: Wed, 29 Apr 2009 16:08:57 +0200 Subject: [Bioperl-l] parsing /www.ebi.ac.uk/pdbsum/ Message-ID: Hi, thanks for accepting me in the mailing list, I'm Paola and I work in the institute of cancer in Genoa, Italy, as a bioinformatic...I'm biologist, quite new in perl...(2 months) and never used bioperl...because I prefer learning a little perl before, but now parsing, parsing, and parsing bioinformatic web sites....I need Bioperl :-) I visited www.bioperl.org and read tutorials, I read about a lot of moduls used to parse different web site. I need to parse one in particular EMBL-EBI http://www.ebi.ac.uk/pdbsum/ that is different from EMBL because there are also other information protein-ligand interaction....I never used bioperl moduls...and parsed by myself...but If the receptor has more ligands...it is more difficult to parse...to choose which ligands I need because there are "false" ligands as ions or glycerol that I don't need but I don't know the synthax of this source...for everything can be seen as a ligand....so I want to know if there are moduls that I can use to do my analysis...if anyone can help me...is very wellcome... Thanks From jason at bioperl.org Wed Apr 29 12:41:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 29 Apr 2009 09:41:02 -0700 Subject: [Bioperl-l] Fwd: Parse problem of a big EMBL entry References: Message-ID: Brian - please always CC the mailing list on replies. Not sure what is causing the seg fault so I can't really help here - if you want to file it as a bug at the bugzilla with instructions on how to reproduce it will hopefully get looked at. -jason Begin forwarded message: > From: brian li > Date: April 29, 2009 1:23:32 AM PDT > To: Jason Stajich > Subject: Re: [Bioperl-l] Parse problem of a big EMBL entry > > Hi Jason, > >> Without memory leaks it should only take up as much memory as the >> current >> sequence you have parsed. If you mean you have a sequence record >> with > 1M >> lines I'm not sure how much memory that would take up, depends on >> if this is >> lots of feature or what. > > Lots of feature. > >> There are ways to tell BioPerl to throw away >> things you don't want to parse out from the record. See >> http://bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > Thanks. I think this would help. > >> Perl will use as much memory as is available on your machine. Have >> you >> monitored the memory use of the perl running to insure it is >> reaching the >> 32Gb limit and that is in fact what is killing the program? > > I monitored the memory usage in my last run. The size of free > memory didn't change a lot, and remained to be around 20GB (buffer > size added). I took the wrong assumption. Thanks again for your hint. > > BTW: The message I get when I parse big million-line entry is > "Segmentation fault". Not familiar with this and trying to get a clue. > > Brian Jason Stajich jason at bioperl.org From razi.khaja at gmail.com Wed Apr 29 15:08:14 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 15:08:14 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> Message-ID: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Hello, I am generating BLAST alignments using the BLAST URL API from NCBI. I want to parse details from BLAST reports whenever there are "Features in/flanking this part of subject sequence".? A portion of the BLAST report showing "Features flanking ..." is pasted below. I am using Bio::SearchIO to parse details.? The relevant part of the script is below. The problem I am having is that for some reason the first occurrence of a "Feature flanking this part of a subject sequence" is skipped. I am only able to parse/print all occurrences of a "Feature in/flanking this part of a subject sequence" from the second occurrence to the last occurrence. I believe the code responsible for parsing this information is in Bio/SearchIO/blast.pm, starting on line 760. I have tried fixing the code in Bio/SearchIO/blast.pm myself but was not able to correct the problem. Would it be possible for someone to fix the code in the Bio/SearchIO/blast.pm module, or help me fix the code so that the first occurrence is not skipped? Thanks, Razi ===== The part of the script that is relevant to parsing "Features in/flanking..." ==== my $bio_searchio_in = Bio::SearchIO->new( ??? -file?? => 'blast_result.txt', ??? -format => 'blast' ); my $i = 1; while( my $result = $bio_searchio_in->next_result() ){ ??? while( my $hit = $result->next_hit() ){ ??????? while( my $hsp = $hit->next_hsp() ){ ??????????? my $hsp_features = $hsp->hit_features(); ??????????? if( $hsp_features ) { ??????????????? print "HSP FEATURE $i\t$hsp_features\n"; ??????????????? $i++; ??????????? } ??????? } ??? } } ===== A portion of a BLAST report with "Features flanking ..." ===== ... ... ?Score = 54.7 bits (29),? Expect = 0.003 ?Identities = 29/29 (100%), Gaps = 0/29 (0%) ?Strand=Plus/Minus Query? 6556???? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6584 ??????????????? ||||||||||||||||||||||||||||| Sbjct? 6953042? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6953014 >gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic contig Length=237250 ?Features flanking this part of subject sequence: ?? 16338 bp at 5' side: PRAME family member 8 ?? 11926 bp at 3' side: PRAME family member 9 ?Score = 7286 bits (3945),? Expect = 0.0 ?Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) ?Strand=Plus/Plus Query? 23225? GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG 23284 ????????????? |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| Sbjct? 86128? GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG 86187 Query? 23285? GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA 23344 ????????????? ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| Sbjct? 86188? GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA 86247 ... ... From cjfields at illinois.edu Wed Apr 29 15:41:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Apr 2009 14:41:54 -0500 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Message-ID: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> I'm assuming this is from an older bioperl; this data should be accessible via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl 1.6.0 in CPAN). chris On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > Hello, > > I am generating BLAST alignments using the BLAST URL API from NCBI. > > I want to parse details from BLAST reports whenever there are > "Features in/flanking this part of subject sequence". A portion of > the BLAST report showing "Features flanking ..." is pasted below. > > I am using Bio::SearchIO to parse details. The relevant part of the > script is below. > > The problem I am having is that for some reason the first occurrence > of a "Feature flanking this part of a subject sequence" is skipped. > I am only able to parse/print all occurrences of a "Feature > in/flanking this part of a subject sequence" from the second > occurrence to the last occurrence. > > I believe the code responsible for parsing this information is in > Bio/SearchIO/blast.pm, starting on line 760. > I have tried fixing the code in Bio/SearchIO/blast.pm myself but was > not able to correct the problem. > Would it be possible for someone to fix the code in the > Bio/SearchIO/blast.pm module, or help me fix the code so that the > first occurrence is not skipped? > > Thanks, > Razi > ===== The part of the script that is relevant to parsing "Features > in/flanking..." ==== > my $bio_searchio_in = Bio::SearchIO->new( > -file => 'blast_result.txt', > -format => 'blast' > ); > > my $i = 1; > while( my $result = $bio_searchio_in->next_result() ){ > while( my $hit = $result->next_hit() ){ > while( my $hsp = $hit->next_hsp() ){ > my $hsp_features = $hsp->hit_features(); > if( $hsp_features ) { > print "HSP FEATURE $i\t$hsp_features\n"; > $i++; > } > } > } > } > > ===== A portion of a BLAST report with "Features flanking ..." ===== > ... > ... > Score = 54.7 bits (29), Expect = 0.003 > Identities = 29/29 (100%), Gaps = 0/29 (0%) > Strand=Plus/Minus > > Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 > ||||||||||||||||||||||||||||| > Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 > > >> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 >> genomic contig > Length=237250 > > Features flanking this part of subject sequence: > 16338 bp at 5' side: PRAME family member 8 > 11926 bp at 3' side: PRAME family member 9 > > Score = 7286 bits (3945), Expect = 0.0 > Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) > Strand=Plus/Plus > > Query 23225 > GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG > 23284 > |||||||||||||||||||||||||||||||| |||||| ||||||||||| > |||||||| > Sbjct 86128 > GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG > 86187 > > Query 23285 > GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA > 23344 > ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| > ||||| > Sbjct 86188 > GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA > 86247 > ... > ... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjm at berkeleybop.org Wed Apr 29 16:58:15 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Wed, 29 Apr 2009 13:58:15 -0700 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> The .ontology files have been deprecated by GO. Use the .obo files instead. It appears the bioperl parser for the .ontology files isn't able to deal with the new relations in GO. I suggest that the bioperl .ontology parser is deprecated too On Apr 22, 2009, at 6:38 AM, Hilmar Lapp wrote: > Hi Carlos, > > I am moving your inquiry to the BioPerl list, as the tool is a part > of Bioperl-db and uses BioPerl for parsing the ontologies. > > In your case, the goflat parser in BioPerl seems to balk at the > second one of the input files. It may be that the input file is > (was?) corrupted, that does happen every once in a while. More > likely though is that the goflat parser hasn't kept up with some > format changes. Have you tried using the obo format version instead? > > -hilmar > > On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > >> Hi guys >> >> I'm working with biosql and I try to figure out how to load >> ontologies into biosql. >> >> I've tried >> >> load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- >> host localhost --dbname biosql --namespace "Gene Ontology" --format >> goflat --fmtargs "-defs_file,GO.defs" function.ontology >> process.ontology component.ontology >> >> as in the script info but I have an error, >> >> >> ------------------- WARNING --------------------- >> MSG: DBLink exists in the dblink of _default >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: format error (file process.ontology) offending line: >> -negative regulation of angiogenesis ; GO:0016525 ; synonym:down >> regulation of angiogenesis ; synonym:down\-regulation of >> angiogenesis ; synonym:downregulation of angiogenesis ; >> synonym:inhibition of angiogenesis % negative regulation of >> developmental process ; GO:0051093 % regulation of angiogenesis ; >> GO:0045765 >> >> STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ >> perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 >> STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ >> Bio/OntologyIO/dagflat.pm:284 >> STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ >> 5.10.0/Bio/OntologyIO/dagflat.pm:317 >> STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ >> load_ontology.pl:604 >> ------------------------------------- >> >> Any suggestion? >> >> Cheers, >> >> Carlos >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Apr 29 19:48:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 29 Apr 2009 19:48:10 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com><62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <7A9746282BA343F78423D12DB1578509@NewLife> also check out http://www.bioperl.org/wiki/Parsing_BLAST_HSPs MAJ ----- Original Message ----- From: "Chris Fields" To: "Razi Khaja" Cc: Sent: Wednesday, April 29, 2009 3:41 PM Subject: Re: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl > 1.6.0 in CPAN). > > chris > > On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > >> Hello, >> >> I am generating BLAST alignments using the BLAST URL API from NCBI. >> >> I want to parse details from BLAST reports whenever there are >> "Features in/flanking this part of subject sequence". A portion of >> the BLAST report showing "Features flanking ..." is pasted below. >> >> I am using Bio::SearchIO to parse details. The relevant part of the >> script is below. >> >> The problem I am having is that for some reason the first occurrence >> of a "Feature flanking this part of a subject sequence" is skipped. >> I am only able to parse/print all occurrences of a "Feature >> in/flanking this part of a subject sequence" from the second >> occurrence to the last occurrence. >> >> I believe the code responsible for parsing this information is in >> Bio/SearchIO/blast.pm, starting on line 760. >> I have tried fixing the code in Bio/SearchIO/blast.pm myself but was >> not able to correct the problem. >> Would it be possible for someone to fix the code in the >> Bio/SearchIO/blast.pm module, or help me fix the code so that the >> first occurrence is not skipped? >> >> Thanks, >> Razi > > > >> ===== The part of the script that is relevant to parsing "Features >> in/flanking..." ==== >> my $bio_searchio_in = Bio::SearchIO->new( >> -file => 'blast_result.txt', >> -format => 'blast' >> ); >> >> my $i = 1; >> while( my $result = $bio_searchio_in->next_result() ){ >> while( my $hit = $result->next_hit() ){ >> while( my $hsp = $hit->next_hsp() ){ >> my $hsp_features = $hsp->hit_features(); >> if( $hsp_features ) { >> print "HSP FEATURE $i\t$hsp_features\n"; >> $i++; >> } >> } >> } >> } >> >> ===== A portion of a BLAST report with "Features flanking ..." ===== >> ... >> ... >> Score = 54.7 bits (29), Expect = 0.003 >> Identities = 29/29 (100%), Gaps = 0/29 (0%) >> Strand=Plus/Minus >> >> Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 >> ||||||||||||||||||||||||||||| >> Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 >> >> >>> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic >>> contig >> Length=237250 >> >> Features flanking this part of subject sequence: >> 16338 bp at 5' side: PRAME family member 8 >> 11926 bp at 3' side: PRAME family member 9 >> >> Score = 7286 bits (3945), Expect = 0.0 >> Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) >> Strand=Plus/Plus >> >> Query 23225 GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG >> 23284 >> |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| >> Sbjct 86128 GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG >> 86187 >> >> Query 23285 GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA >> 23344 >> ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| >> Sbjct 86188 GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA >> 86247 >> ... >> ... >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Apr 29 20:31:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Apr 2009 12:31:06 +1200 Subject: [Bioperl-l] waaaay off topic question In-Reply-To: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> References: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C84151@exchsth.agresearch.co.nz> I have a question that's nothing to do with BioPerl or Perl, but hope there's a chance that some of you clever people may be doing the same thing as me :-) I've been asked to write some VB scripts to control Applied Biosystems "Analyst QS" and "BioAnalyst" applications for analyzing mass-spec data. There's limited documentation (10yr out of date) with some example code (that doesn't compile) so I'm not getting as far along as I'd like. Has anyone worked with this stuff before? Any assistance greatly appreciated !!! Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From razi.khaja at gmail.com Wed Apr 29 23:57:17 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 23:57:17 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <62e9dabc0904292057y6b725e0yc3b0a85c661c44f8@mail.gmail.com> Hello Chris, I am using bioperl 1.6.0. It may be a few weeks before I can upgrade to bioperl-live from svn, and so it may be a few weeks before I can return to my question. When I do upgrade, I will report back to this thread if I still encounter problems. Razi On Wed, Apr 29, 2009 at 3:41 PM, Chris Fields wrote: > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in > bioperl 1.6.0 in CPAN). > > chris > > > From jonathanmflowers at gmail.com Thu Apr 30 12:40:42 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:40:42 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem Message-ID: <23319982.post@talk.nabble.com> Dear colleagues, I have set up a mySQL database and loaded a GFF3 and fasta file using Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be working normally except when I attempt to create a Bio::DB::SeqFeature::Segment object. The following works as expected: my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:foo', -user => 'myuser', -pass => 'mypassword', -write => '1'); my @features = $db->features(-seq_id=>'chr1', -start=>1, -end=>10000, -types=>['gene']); However, when I try to create a segment object using either of the two following method calls I get an error: my $segment = $db->segment('chr1',1=>10000); my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => '10000'); -------------------------------- EXCEPTION ------------------------------------ MSG: segment() called in a scalar context but multiple features match. Either call in a list context or narrow your search using the -types or -class arguments STACK Bio::DB::SeqFeature::Store::segment /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 STACK toplevel trial.pl:42 ------------------------------------------------------- Calling in list context (which is not defined in the documentation) produces an array of 22 identical scalars = 'chr1:1..10000'. Any ideas? Thanks Jonathan -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Apr 30 12:52:24 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:52:24 -0700 (PDT) Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23264714.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> Message-ID: <23320232.post@talk.nabble.com> Hi, There is no means to do this in bioperl, but it is simple to make a system call and execute an MSA program such as MUSCLE to align fasta-formatted sequences using something like... qx(muscle -in $infilename -out $outfilename) Jonathan laxmanb wrote: > > I need to create a multiple sequence alignment of some sequences using > CLUSTALW or any other Multiple sequence alignment program. However, I've > learnt that this functionality used to be UNIX/Linux only. However, the > documentation is also very old, so I'd like to know if any CLUSTAL/ any > other MSA programs can be run using BioPerl on Windows. > > Thank you for your time :) > -- View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Thu Apr 30 13:04:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Apr 2009 12:04:46 -0500 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <92920FDD-7CB2-4331-9860-87304E16C948@illinois.edu> I don't recall this being a UNIX-only issue, though admittedly it's been years since I've tried running the bioperl-run modules on WinXP. I do recall getting BLAST, EMBOSS and others to work though; I don't see why ClustalW would be much different. Have you actually tested this out and found a problem? Have you tried cygwin? chris On Apr 30, 2009, at 11:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Apr 30 13:29:29 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:29:29 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <23319982.post@talk.nabble.com> References: <23319982.post@talk.nabble.com> Message-ID: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> One would have to see some of your GFF to know better. It sounds like you have chr1 defined in multiple places. Did you use the bp_seqfeature_load script to load the data in one go - it should catch it if you have non-unique IDs. -jason On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > > Dear colleagues, > > I have set up a mySQL database and loaded a GFF3 and fasta file using > Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be > working > normally except when I attempt to create a > Bio::DB::SeqFeature::Segment > object. > > The following works as expected: > > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:foo', > -user => 'myuser', > -pass => 'mypassword', > -write => '1'); > > my @features = $db->features(-seq_id=>'chr1', > -start=>1, > -end=>10000, > -types=>['gene']); > > However, when I try to create a segment object using either of the two > following method calls I get an error: > > my $segment = $db->segment('chr1',1=>10000); > > my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => > '10000'); > > -------------------------------- EXCEPTION > ------------------------------------ > > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types > or > -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 > STACK toplevel trial.pl:42 > ------------------------------------------------------- > > Calling in list context (which is not defined in the documentation) > produces > an array of 22 identical scalars = 'chr1:1..10000'. > > Any ideas? > > Thanks > > Jonathan > > -- > View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Apr 30 13:31:19 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:31:19 -0700 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <734F5ADF-77F5-4AA5-A676-79B42B3C54CB@bioperl.org> the bioperl-run module of Bio::Tools::Run::Alignment::Clustalw or MUSCLE ones don't work then? They do the cmdline work for you. On Apr 30, 2009, at 9:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 15:27:15 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 12:27:15 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion Message-ID: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> So, I'm parsing Genbank sequences to pull out the various exons. I found the way to get the NCBI Exon number from each feature, but am confused about one of the methods. When I do annotation->as_text I'm expecting to get back 1 from the feature, but instead get back Value: 1 ??!? Why is the value from the NCBI file getting that text tagged onto it? http://www.ncbi.nlm.nih.gov/nuccore/73622129 exon 1..774 /gene="BOLA2" /gene_synonym="BOLA2A; My016" /inference="alignment:Splign" /number=1 print ($f->annotation->get_Annotations('number'))[0]->as_text; Value: 1 From SMarkel at accelrys.com Thu Apr 30 15:56:40 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 30 Apr 2009 15:56:40 -0400 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Kevin, I believe the extra text was added for readability when printing to the console. In our code we just add the following post- processing step. (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 30 April 2009 12:27 PM > To: BioPerl List > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > So, I'm parsing Genbank sequences to pull out the various exons. I found > the way to get the NCBI Exon number from each feature, but am confused > about one of the methods. When I do annotation->as_text I'm expecting to > get back 1 from the feature, but instead get back Value: 1 ??!? Why is > the value from the NCBI file getting that text tagged onto it? > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > exon 1..774 > /gene="BOLA2" > /gene_synonym="BOLA2A; My016" > /inference="alignment:Splign" > /number=1 > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > Value: 1 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Thu Apr 30 16:01:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:01:03 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Message-ID: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> That's nice in some regards, but makes it hard to use the function in code without having to always process the result, which seems to be counter to what one would expect. E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". Is there a better way to get the number directly without having to strip off the text that never existed in the first place? > -----Original Message----- > From: Scott Markel [mailto:SMarkel at accelrys.com] > Sent: Thursday, April 30, 2009 12:57 PM > To: Kevin Brown; BioPerl List > Subject: RE: Bio::Annotations::Collection confusion > > Kevin, > > I believe the extra text was added for readability when printing > to the console. In our code we just add the following post- > processing step. > > (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > Sent: Thursday, 30 April 2009 12:27 PM > > To: BioPerl List > > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > > > So, I'm parsing Genbank sequences to pull out the various > exons. I found > > the way to get the NCBI Exon number from each feature, but > am confused > > about one of the methods. When I do annotation->as_text I'm > expecting to > > get back 1 from the feature, but instead get back Value: 1 > ??!? Why is > > the value from the NCBI file getting that text tagged onto it? > > > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > exon 1..774 > > /gene="BOLA2" > > /gene_synonym="BOLA2A; My016" > > /inference="alignment:Splign" > > /number=1 > > > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > > Value: 1 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jonathanmflowers at gmail.com Thu Apr 30 16:22:23 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 13:22:23 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> References: <23319982.post@talk.nabble.com> <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> Message-ID: <23322607.post@talk.nabble.com> Jason, I used the Bio::DB::SeqFeature::Store::GFF3Loader rather than the bp_seqfeature_load.pl script. You were right, however. It looks like I had populated the MySQL database with multiple fasta files. I cleared the database, ran the GFF3Loader twice (once for the fasta, once for the GFF3). Segment objects are appear to be working fine now. THANKS! Jonathan Jason Stajich-3 wrote: > > One would have to see some of your GFF to know better. It sounds like > you have chr1 defined in multiple places. > > Did you use the bp_seqfeature_load script to load the data in one go - > it should catch it if you have non-unique IDs. > > -jason > On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > >> >> Dear colleagues, >> >> I have set up a mySQL database and loaded a GFF3 and fasta file using >> Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be >> working >> normally except when I attempt to create a >> Bio::DB::SeqFeature::Segment >> object. >> >> The following works as expected: >> >> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >> -dsn => 'dbi:mysql:foo', >> -user => 'myuser', >> -pass => 'mypassword', >> -write => '1'); >> >> my @features = $db->features(-seq_id=>'chr1', >> -start=>1, >> -end=>10000, >> -types=>['gene']); >> >> However, when I try to create a segment object using either of the two >> following method calls I get an error: >> >> my $segment = $db->segment('chr1',1=>10000); >> >> my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => >> '10000'); >> >> -------------------------------- EXCEPTION >> ------------------------------------ >> >> MSG: segment() called in a scalar context but multiple features match. >> Either call in a list context or narrow your search using the -types >> or >> -class arguments >> >> STACK Bio::DB::SeqFeature::Store::segment >> /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 >> STACK toplevel trial.pl:42 >> ------------------------------------------------------- >> >> Calling in list context (which is not defined in the documentation) >> produces >> an array of 22 identical scalars = 'chr1:1..10000'. >> >> Any ideas? >> >> Thanks >> >> Jonathan >> >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23322607.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Thu Apr 30 16:24:25 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 13:24:25 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> Message-ID: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Seems like you just want $annotation->value ? =head2 as_text Title : as_text Usage : my $text = $obj->as_text Function: return the string "Value: $v" where $v is the value Returns : string Args : none =cut =head2 display_text Title : display_text Usage : my $str = $ann->display_text(); Function: returns a string. Unlike as_text(), this method returns a string formatted as would be expected for te specific implementation. One can pass a callback as an argument which allows custom text generation; the callback is passed the current instance and any text returned Example : Returns : a string Args : [optional] callback =cut =head2 value Title : value Usage : $obj->value($newval) Function: Get/Set the value for simplevalue Returns : value of value Args : newvalue (optional) =cut On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > That's nice in some regards, but makes it hard to use the function in > code without having to always process the result, which seems to be > counter to what one would expect. > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > Is there a better way to get the number directly without having to > strip > off the text that never existed in the first place? > >> -----Original Message----- >> From: Scott Markel [mailto:SMarkel at accelrys.com] >> Sent: Thursday, April 30, 2009 12:57 PM >> To: Kevin Brown; BioPerl List >> Subject: RE: Bio::Annotations::Collection confusion >> >> Kevin, >> >> I believe the extra text was added for readability when printing >> to the console. In our code we just add the following post- >> processing step. >> >> (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Co-chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >>> Sent: Thursday, 30 April 2009 12:27 PM >>> To: BioPerl List >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion >>> >>> So, I'm parsing Genbank sequences to pull out the various >> exons. I found >>> the way to get the NCBI Exon number from each feature, but >> am confused >>> about one of the methods. When I do annotation->as_text I'm >> expecting to >>> get back 1 from the feature, but instead get back Value: 1 >> ??!? Why is >>> the value from the NCBI file getting that text tagged onto it? >>> >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 >>> exon 1..774 >>> /gene="BOLA2" >>> /gene_synonym="BOLA2A; My016" >>> /inference="alignment:Splign" >>> /number=1 >>> >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; >>> Value: 1 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 16:45:29 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:45:29 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> OK. Can't see that method in the Deobfuscator which might explain why I didn't know about it. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A %3AAnnotation%3A%3ACollection > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > Behalf Of Jason Stajich > Sent: Thursday, April 30, 2009 1:24 PM > To: Kevin Brown > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > Seems like you just want $annotation->value ? > > > =head2 as_text > > Title : as_text > Usage : my $text = $obj->as_text > Function: return the string "Value: $v" where $v is the value > Returns : string > Args : none > > > =cut > > =head2 display_text > > Title : display_text > Usage : my $str = $ann->display_text(); > Function: returns a string. Unlike as_text(), this method > returns a > string > formatted as would be expected for te specific > implementation. > > One can pass a callback as an argument which > allows custom > text > generation; the callback is passed the current instance > and any text > returned > Example : > Returns : a string > Args : [optional] callback > > =cut > > =head2 value > > Title : value > Usage : $obj->value($newval) > Function: Get/Set the value for simplevalue > Returns : value of value > Args : newvalue (optional) > > > =cut > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > That's nice in some regards, but makes it hard to use the > function in > > code without having to always process the result, which seems to be > > counter to what one would expect. > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > Is there a better way to get the number directly without having to > > strip > > off the text that never existed in the first place? > > > >> -----Original Message----- > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > >> Sent: Thursday, April 30, 2009 12:57 PM > >> To: Kevin Brown; BioPerl List > >> Subject: RE: Bio::Annotations::Collection confusion > >> > >> Kevin, > >> > >> I believe the extra text was added for readability when printing > >> to the console. In our code we just add the following post- > >> processing step. > >> > >> (my $text = $annotation->as_text()) =~ > s/(Comment|Value): //; > >> > >> Scott > >> > >> Scott Markel, Ph.D. > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > >> San Diego, CA 92121 fax: +1 858 799 5222 > >> USA web: http://www.accelrys.com > >> > >> http://www.linkedin.com/in/smarkel > >> Vice President, Board of Directors: > >> International Society for Computational Biology > >> Co-chair: ISCB Publications Committee > >> Associate Editor: PLoS Computational Biology > >> Editorial Board: Briefings in Bioinformatics > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > >>> Sent: Thursday, 30 April 2009 12:27 PM > >>> To: BioPerl List > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > >>> > >>> So, I'm parsing Genbank sequences to pull out the various > >> exons. I found > >>> the way to get the NCBI Exon number from each feature, but > >> am confused > >>> about one of the methods. When I do annotation->as_text I'm > >> expecting to > >>> get back 1 from the feature, but instead get back Value: 1 > >> ??!? Why is > >>> the value from the NCBI file getting that text tagged onto it? > >>> > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > >>> exon 1..774 > >>> /gene="BOLA2" > >>> /gene_synonym="BOLA2A; My016" > >>> /inference="alignment:Splign" > >>> /number=1 > >>> > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > >>> Value: 1 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > From Russell.Smithies at agresearch.co.nz Thu Apr 30 17:28:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 1 May 2009 09:28:39 +1200 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C843A2@exchsth.agresearch.co.nz> It's buried in Bio::Annotation::SimpleValue I think http://bioperl.org/cgi-bin/deob_interface.cgi?Search=&module=&sort_order=by+method&search_string=Bio%3A%3AAnnotation%3A%3ASimpleValue&Filter=Submit+Query > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Friday, 1 May 2009 8:45 a.m. > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > OK. Can't see that method in the Deobfuscator which might explain why I > didn't know about it. > > http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 > A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A > %3AAnnotation%3A%3ACollection > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > > Behalf Of Jason Stajich > > Sent: Thursday, April 30, 2009 1:24 PM > > To: Kevin Brown > > Cc: BioPerl List > > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > > > Seems like you just want $annotation->value ? > > > > > > =head2 as_text > > > > Title : as_text > > Usage : my $text = $obj->as_text > > Function: return the string "Value: $v" where $v is the value > > Returns : string > > Args : none > > > > > > =cut > > > > =head2 display_text > > > > Title : display_text > > Usage : my $str = $ann->display_text(); > > Function: returns a string. Unlike as_text(), this method > > returns a > > string > > formatted as would be expected for te specific > > implementation. > > > > One can pass a callback as an argument which > > allows custom > > text > > generation; the callback is passed the current instance > > and any text > > returned > > Example : > > Returns : a string > > Args : [optional] callback > > > > =cut > > > > =head2 value > > > > Title : value > > Usage : $obj->value($newval) > > Function: Get/Set the value for simplevalue > > Returns : value of value > > Args : newvalue (optional) > > > > > > =cut > > > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > > > That's nice in some regards, but makes it hard to use the > > function in > > > code without having to always process the result, which seems to be > > > counter to what one would expect. > > > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > > > Is there a better way to get the number directly without having to > > > strip > > > off the text that never existed in the first place? > > > > > >> -----Original Message----- > > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > > >> Sent: Thursday, April 30, 2009 12:57 PM > > >> To: Kevin Brown; BioPerl List > > >> Subject: RE: Bio::Annotations::Collection confusion > > >> > > >> Kevin, > > >> > > >> I believe the extra text was added for readability when printing > > >> to the console. In our code we just add the following post- > > >> processing step. > > >> > > >> (my $text = $annotation->as_text()) =~ > > s/(Comment|Value): //; > > >> > > >> Scott > > >> > > >> Scott Markel, Ph.D. > > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > > >> San Diego, CA 92121 fax: +1 858 799 5222 > > >> USA web: http://www.accelrys.com > > >> > > >> http://www.linkedin.com/in/smarkel > > >> Vice President, Board of Directors: > > >> International Society for Computational Biology > > >> Co-chair: ISCB Publications Committee > > >> Associate Editor: PLoS Computational Biology > > >> Editorial Board: Briefings in Bioinformatics > > >> > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > >>> Sent: Thursday, 30 April 2009 12:27 PM > > >>> To: BioPerl List > > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > >>> > > >>> So, I'm parsing Genbank sequences to pull out the various > > >> exons. I found > > >>> the way to get the NCBI Exon number from each feature, but > > >> am confused > > >>> about one of the methods. When I do annotation->as_text I'm > > >> expecting to > > >>> get back 1 from the feature, but instead get back Value: 1 > > >> ??!? Why is > > >>> the value from the NCBI file getting that text tagged onto it? > > >>> > > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > >>> exon 1..774 > > >>> /gene="BOLA2" > > >>> /gene_synonym="BOLA2A; My016" > > >>> /inference="alignment:Splign" > > >>> /number=1 > > >>> > > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > > >>> Value: 1 > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason at bioperl.org > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Kevin.M.Brown at asu.edu Thu Apr 30 17:56:16 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 14:56:16 -0700 Subject: [Bioperl-l] Other object oddities Message-ID: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> So, I'm using quite a bit of bioperl code in my own stuff and have been seeing some oddities with the naming of methods. A good example would be in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method called "seq" but in the latter case it returns an object (and expects an object when doing a Set) and in the former it returns a string and expects a string when doing a Set. This makes for a bit of brain freeze on my part when the return from another object might be a Bio::Seq or Bio::SeqFeature::Generic and now calling the ->seq returns different things. Guess I'm just curious if anyone has done an audit of the methods of the various objects and their return types to see how consistent they are across even a subsection of the codebase? From maj at fortinbras.us Wed Apr 1 01:28:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 01:28:24 -0400 Subject: [Bioperl-l] #bioperl bot talk Message-ID: <2589D1BF1EA24C119C06982EB70F490C@NewLife> Hi All, Some cool stuff going on on the IRC node (freenode.net/#bioperl). Andrew Stewart has been prototyping an irc bot with Bioperl functionality built-in. The possibilities for improving support and logging our increasing irc traffic are terrifying. I've set up a wiki page (http://www.bioperl.org/wiki/Bots) under the new IRC category for discussions. Please feel free to contribute use cases, ideas, praise and blame. cheers, Mark From johann.pellet at inserm.fr Wed Apr 1 06:14:25 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 1 Apr 2009 12:14:25 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: Hi all, With the latest version of BioPerl and BioSQL, I have tried to insert entry from a GenBank file, which I have downloaded from the NCBI website (648 937 records) After successfully loading ncbi_taxonomy i am getting following error message while loading sequences into database. perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg -dbname biosql --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Human papillomavirus type 2c' (I was supplied 'Human papillomavirus - 2 | Alphapapillomavirus | Pa pillomaviridae') the script is not stopped until this entry: S67864 --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, values were ("1","19)","1","3") FKs (41914,) ERROR: invalid input syntax for integer: "19)" --------------------------------------------------- Could not store S67864: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::LocationAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ig nored until end of transaction block STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /Library/ Perl/5.8.8/Bio/DB/BioSQL/SeqFeatureAdaptor.pm:291 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:257 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 Any Idea? Thanks in advance Johann From florent.angly at gmail.com Wed Apr 1 13:03:28 2009 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 01 Apr 2009 10:03:28 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D39E60.1020103@gmail.com> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you won't be able to put its information in a hash (unless you have a lot of memory). Florent Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. > The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. > > It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the simplest way. > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> Sent: Wednesday, 1 April 2009 7:43 a.m. >> To: bioperl-l >> Subject: [Bioperl-l] taxonomy ID >> >> Hi All, >> I am writing a script, for one of its part i have to parse a blast >> report (refseq blast) and check how may organisms are eukaryotes and how >> namy of them are prokaryotes. >> I am using BIO::DB::taxinomy module: >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >> >> But for this i need a taxonomyid (like '33090') given in the example. >> So is it possible to get a taxonomyid from refseq balst report? >> If not then how i can deal with this problem? >> >> i would really appreciate if anyone can help me out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Wed Apr 1 13:15:48 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 19:15:48 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> Message-ID: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Hi all, I have a list of PUBMED IDs and I am trying to retrieve automatically the *full article* in any format (not just the abstract). Is there any method in bioperl that allows this? any other solution? Currently I am trying to solve this using WWW::Mechanize, but do you know of any other method to do this? Any help would be appreciated, Thanks in advance, M; From kanzure at gmail.com Wed Apr 1 14:18:22 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Wed, 1 Apr 2009 13:18:22 -0500 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <55ad6af70904011118q7cbdb05u9c89958de3ccc87e@mail.gmail.com> On Wed, Apr 1, 2009 at 12:15 PM, Miguel Pignatelli wrote: > I have a list of PUBMED IDs and I am trying to retrieve automatically the > *full article* in any format (not just the abstract). Is there any method in > bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you know of > any other method to do this? You can try pubget.com- it's a web gateway to download pubmedcentral articles. Unfortunately this means it does not have pubmed articles. What I have found with pubmed is that it's mainly a listing of abstracts, and then the various papers may or may not be online in their respective journals on the web somewhere else, and rarely are there any links to the publisher website. So how are you using WWW::Mechanize in this context? Is there some secret to attaining papers that are listed via pubmed? There's no magical links to the publisher websites .. so what's going on? - Bryan http://heybryan.org/ 1 512 203 0507 From Russell.Smithies at agresearch.co.nz Wed Apr 1 15:33:35 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:33:35 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5615@exchsth.agresearch.co.nz> There's always more than one way to do it. I have no trouble loading it into a hash but you could just grep the file: my(undef,$tax_id) = split("\s", `grep -w -P "^$accession" gi_taxid_prot.dmp`); --Russell > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Thursday, 2 April 2009 6:03 a.m. > To: Smithies, Russell > Cc: 'shalabh sharma'; 'bioperl-l' > Subject: Re: [Bioperl-l] taxonomy ID > > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: > > The taxonomy information isn't in the blast output unless you created custom > fasta headers for your blast database. > > The easiest way to get the tax_id for your accessions would be to download > the gi->tax_id list from > ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > > If you load that file into a hash, parse the accessions out of the blast > hits then lookup the tax_id from that hash, I think it should be fairly fast. > > > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a > separate problem :-) > > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file > contained within lists what division each tax_id belongs to (Bacteria, > Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out > from that. > > > > It's not a very BioPerly solution but sometimes just looking up the answer > from a file/table/hash is the simplest way. > > > > Hope this helps, > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >> Sent: Wednesday, 1 April 2009 7:43 a.m. > >> To: bioperl-l > >> Subject: [Bioperl-l] taxonomy ID > >> > >> Hi All, > >> I am writing a script, for one of its part i have to parse a > blast > >> report (refseq blast) and check how may organisms are eukaryotes and how > >> namy of them are prokaryotes. > >> I am using BIO::DB::taxinomy module: > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> But for this i need a taxonomyid (like '33090') given in the example. > >> So is it possible to get a taxonomyid from refseq balst report? > >> If not then how i can deal with this problem? > >> > >> i would really appreciate if anyone can help me out. > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From Russell.Smithies at agresearch.co.nz Wed Apr 1 15:48:02 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:48:02 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Not all articles have full-text at Pubmed but if you know the article ID, you can usually get the whole article (if available) like this: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez or as pdf http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf I'd just build a URL and use wget. If you're searching Pubmed directly, use a query like this to ensure you only get articles with links to full text: cancer AND (free full text[sb]) eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text[sb]) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > Sent: Thursday, 2 April 2009 6:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Hi all, > > I have a list of PUBMED IDs and I am trying to retrieve automatically > the *full article* in any format (not just the abstract). Is there any > method in bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you > know of any other method to do this? > > Any help would be appreciated, > > Thanks in advance, > > M; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From miguel.pignatelli at uv.es Wed Apr 1 18:14:13 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 2 Apr 2009 00:14:13 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: Thanks for the response, I have PMIDs extracted from Genbank flat files, is there a way to convert PMIDs to PMCIDs? I found this page: http://www.ncbi.nlm.nih.gov/sites/pmctopmid Is it possible to download the underlying conversion table for local use? Thank you very much in advance, M; El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > Not all articles have full-text at Pubmed but if you know the > article ID, you can usually get the whole article (if available) > like this: > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez > > or as pdf > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > I'd just build a URL and use wget. > > If you're searching Pubmed directly, use a query like this to ensure > you only get articles with links to full text: > > cancer AND (free full text[sb]) > eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text > [sb]) > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli >> Sent: Thursday, 2 April 2009 6:16 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles >> >> Hi all, >> >> I have a list of PUBMED IDs and I am trying to retrieve automatically >> the *full article* in any format (not just the abstract). Is there >> any >> method in bioperl that allows this? any other solution? >> Currently I am trying to solve this using WWW::Mechanize, but do you >> know of any other method to do this? >> >> Any help would be appreciated, >> >> Thanks in advance, >> >> M; >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > From Russell.Smithies at agresearch.co.nz Wed Apr 1 18:47:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 11:47:30 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5761@exchsth.agresearch.co.nz> Try this: http://www.pubmedcentral.nih.gov/about/ftp.html#Obtaining_DOIs Use ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz to associate PMC articles with a PMC ID, a PubMed ID, and the corresponding DOI. PMC-ids.csv.gz is a comma separated file with the following fields: * Journal Title * ISSN * Electronic ISSN * Publication Year * Volume * Issue * Page * DOI (if available) * PMC ID * PubMed ID (if available) * Manuscript ID (if available) * Release Date (Mmm DD YYYY or live) --Russell > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Thursday, 2 April 2009 11:14 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Thanks for the response, > > I have PMIDs extracted from Genbank flat files, is there a way to > convert PMIDs to PMCIDs? > I found this page: > > http://www.ncbi.nlm.nih.gov/sites/pmctopmid > > Is it possible to download the underlying conversion table for local > use? > > Thank you very much in advance, > > M; > > > El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > > > Not all articles have full-text at Pubmed but if you know the > > article ID, you can usually get the whole article (if available) > > like this: > > > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentr > ez > > > > or as pdf > > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > > > I'd just build a URL and use wget. > > > > If you're searching Pubmed directly, use a query like this to ensure > > you only get articles with links to full text: > > > > cancer AND (free full text[sb]) > > eg > http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free > +full+text > > [sb]) > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > >> Sent: Thursday, 2 April 2009 6:16 a.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > >> > >> Hi all, > >> > >> I have a list of PUBMED IDs and I am trying to retrieve automatically > >> the *full article* in any format (not just the abstract). Is there > >> any > >> method in bioperl that allows this? any other solution? > >> Currently I am trying to solve this using WWW::Mechanize, but do you > >> know of any other method to do this? > >> > >> Any help would be appreciated, > >> > >> Thanks in advance, > >> > >> M; > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > From tristan.lefebure at gmail.com Wed Apr 1 23:11:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 1 Apr 2009 23:11:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq Message-ID: <200904012311.51764.tristan.lefebure@gmail.com> Hi there, I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. Here is the description: Title : uniq_seq Usage : $aln->uniq_seq(): Remove identical sequences in in the alignment. Ambiguous base ("N", "n") and leading and ending gaps ("-") are NOT counted as differences. Function : Make a new alignment of unique sequence types (STs) Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") 2. ST of each sequence in STDERR Argument : None What I'm trying to obtain is the ST composition (i.e. what is supposed to go to STDERR), but I see nothing... An example: --------test.fasta: >seq1 AAATTTC >seq2 CAATTTC >seq3 AAATTTC ------- ----------test.pl: #! /usr/bin/perl use strict; use warnings; use Bio::AlignIO; use Bio::SimpleAlign; use Getopt::Long; my $in = Bio::AlignIO->new(-file => 'test.fasta' , -format => 'fasta'); my $out = Bio::AlignIO->new(-file => ">test.out" , -format => 'fasta'); while ( my $aln = $in->next_aln() ) { my $red_aln = $aln->uniq_seq; $out->write_aln($red_aln); } ------------- If you run: ./test.pl &> log you will get nothing written into the log file... (but the test.out is OK) Am I missing something? By the way, wouldn't it be more convenient to have the ST composition returned in an array? Thanks, --Tristan (BioPerl 1.6) From maj at fortinbras.us Wed Apr 1 23:28:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 23:28:23 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <200904012311.51764.tristan.lefebure@gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com> Message-ID: <29E09DCE622643848EAFA8F1C6210711@NewLife> Tristan-- Strange: it looks like the prints to stderr have been commented out in the source (back in revision 10242; 1.6 is rev 15582). The two statements are easy to find in the SimpleAlign.pm uniq_seq() source; you can uncomment them to work around this. You are right, this is rather an unconventional way to specify an output option-- can Chris comment? Mark ----- Original Message ----- From: "Tristan Lefebure" To: "BioPerl List" Sent: Wednesday, April 01, 2009 11:11 PM Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Hi there, > > I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. > Here is the description: > > Title : uniq_seq > Usage : $aln->uniq_seq(): Remove identical sequences in > in the alignment. Ambiguous base ("N", "n") and > leading and ending gaps ("-") are NOT counted as > differences. > Function : Make a new alignment of unique sequence types (STs) > Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") > 2. ST of each sequence in STDERR > Argument : None > > What I'm trying to obtain is the ST composition (i.e. what is supposed to go > to STDERR), but I see nothing... > > An example: > > --------test.fasta: >>seq1 > AAATTTC >>seq2 > CAATTTC >>seq3 > AAATTTC > ------- > > > ----------test.pl: > #! /usr/bin/perl > > use strict; > use warnings; > use Bio::AlignIO; > use Bio::SimpleAlign; > use Getopt::Long; > > my $in = Bio::AlignIO->new(-file => 'test.fasta' , > -format => 'fasta'); > > my $out = Bio::AlignIO->new(-file => ">test.out" , > -format => 'fasta'); > > while ( my $aln = $in->next_aln() ) { > my $red_aln = $aln->uniq_seq; > $out->write_aln($red_aln); > } > ------------- > > If you run: > > ./test.pl &> log > > you will get nothing written into the log file... (but the test.out is OK) > > Am I missing something? > By the way, wouldn't it be more convenient to have the ST composition returned > in an array? > > Thanks, > > --Tristan > (BioPerl 1.6) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From weigangq at gmail.com Wed Apr 1 23:57:16 2009 From: weigangq at gmail.com (Weigang Qiu) Date: Wed, 1 Apr 2009 22:57:16 -0500 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <29E09DCE622643848EAFA8F1C6210711@NewLife> References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> Message-ID: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Mark and Tristan, I am the original instigator of the uniq_seq method. The STDERR implementation was used so that STDOUT could be piped. But it did not conform to bioperl convention of using the $self->debug() method. I think that's why these lines were commented out and re-implemented using the $self->debug method. So, turning on the debug option should give the intended ST mapping for each sequence in stderr. weigang On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > Tristan-- > Strange: it looks like the prints to stderr have been commented out in the > source (back in revision 10242; 1.6 is rev 15582). The > two statements are easy to find in the SimpleAlign.pm uniq_seq() source; > you can > uncomment them to work around this. > You are right, this is rather an unconventional way to specify an output > option-- can Chris comment? > Mark > ----- Original Message ----- From: "Tristan Lefebure" < > tristan.lefebure at gmail.com> > To: "BioPerl List" > Sent: Wednesday, April 01, 2009 11:11 PM > Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Hi there, >> >> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >> Here is the description: >> >> Title : uniq_seq >> Usage : $aln->uniq_seq(): Remove identical sequences in >> in the alignment. Ambiguous base ("N", "n") and >> leading and ending gaps ("-") are NOT counted as >> differences. >> Function : Make a new alignment of unique sequence types (STs) >> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >> "ST") >> 2. ST of each sequence in STDERR >> Argument : None >> >> What I'm trying to obtain is the ST composition (i.e. what is supposed to >> go >> to STDERR), but I see nothing... >> >> An example: >> >> --------test.fasta: >> >>> seq1 >>> >> AAATTTC >> >>> seq2 >>> >> CAATTTC >> >>> seq3 >>> >> AAATTTC >> ------- >> >> >> ----------test.pl: >> #! /usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::AlignIO; >> use Bio::SimpleAlign; >> use Getopt::Long; >> >> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >> -format => 'fasta'); >> >> my $out = Bio::AlignIO->new(-file => ">test.out" , >> -format => 'fasta'); >> >> while ( my $aln = $in->next_aln() ) { >> my $red_aln = $aln->uniq_seq; >> $out->write_aln($red_aln); >> } >> ------------- >> >> If you run: >> >> ./test.pl &> log >> >> you will get nothing written into the log file... (but the test.out is OK) >> >> Am I missing something? >> By the way, wouldn't it be more convenient to have the ST composition >> returned >> in an array? >> >> Thanks, >> >> --Tristan >> (BioPerl 1.6) >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Weigang Qiu Department of Biological Sciences Hunter College, City University of New York 695 Park Avenue New York, NY 10065 From maj at fortinbras.us Thu Apr 2 00:15:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 00:15:06 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks Weigang-- I didn't look carefully enough-- I'll make a change to the POD. so Tristan, in your code below, add $aln->verbose(1); before you invoke uniq_seq(). The ST's should then be sent to stderr (as "warns"). MAJ ----- Original Message ----- From: "Weigang Qiu" To: "Mark A. Jensen" Cc: "BioPerl List" ; Sent: Wednesday, April 01, 2009 11:57 PM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Mark and Tristan, > > I am the original instigator of the uniq_seq method. The STDERR > implementation was used so that STDOUT could be piped. But it did not > conform to bioperl convention of using the $self->debug() method. I think > that's why these lines were commented out and re-implemented using the > $self->debug method. So, turning on the debug option should give the > intended ST mapping for each sequence in stderr. > > weigang > > On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > >> Tristan-- >> Strange: it looks like the prints to stderr have been commented out in the >> source (back in revision 10242; 1.6 is rev 15582). The >> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >> you can >> uncomment them to work around this. >> You are right, this is rather an unconventional way to specify an output >> option-- can Chris comment? >> Mark >> ----- Original Message ----- From: "Tristan Lefebure" < >> tristan.lefebure at gmail.com> >> To: "BioPerl List" >> Sent: Wednesday, April 01, 2009 11:11 PM >> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Hi there, >>> >>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >>> Here is the description: >>> >>> Title : uniq_seq >>> Usage : $aln->uniq_seq(): Remove identical sequences in >>> in the alignment. Ambiguous base ("N", "n") and >>> leading and ending gaps ("-") are NOT counted as >>> differences. >>> Function : Make a new alignment of unique sequence types (STs) >>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>> "ST") >>> 2. ST of each sequence in STDERR >>> Argument : None >>> >>> What I'm trying to obtain is the ST composition (i.e. what is supposed to >>> go >>> to STDERR), but I see nothing... >>> >>> An example: >>> >>> --------test.fasta: >>> >>>> seq1 >>>> >>> AAATTTC >>> >>>> seq2 >>>> >>> CAATTTC >>> >>>> seq3 >>>> >>> AAATTTC >>> ------- >>> >>> >>> ----------test.pl: >>> #! /usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use Bio::AlignIO; >>> use Bio::SimpleAlign; >>> use Getopt::Long; >>> >>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>> -format => 'fasta'); >>> >>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>> -format => 'fasta'); >>> >>> while ( my $aln = $in->next_aln() ) { >>> my $red_aln = $aln->uniq_seq; >>> $out->write_aln($red_aln); >>> } >>> ------------- >>> >>> If you run: >>> >>> ./test.pl &> log >>> >>> you will get nothing written into the log file... (but the test.out is OK) >>> >>> Am I missing something? >>> By the way, wouldn't it be more convenient to have the ST composition >>> returned >>> in an array? >>> >>> Thanks, >>> >>> --Tristan >>> (BioPerl 1.6) >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Weigang Qiu > Department of Biological Sciences > Hunter College, City University of New York > 695 Park Avenue > New York, NY 10065 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Thu Apr 2 04:17:02 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 02 Apr 2009 10:17:02 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <49D4747E.4060001@uv.es> You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB->get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = taxDB->get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB->get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB->get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB->get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB->get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin You may consult the documentation of the module for a full description. A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each GI in your blast result: my $superkingdom = $taxDB->get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem while using it don't hesitate to contact me. Hope this helps, M; Florent Angly wrote: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from >> ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should be >> fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the nodes.dmp >> file contained within lists what division each tax_id belongs to >> (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can >> probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E >> russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 >> 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to parse >>> a blast >>> report (refseq blast) and check how may organisms are eukaryotes and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Apr 2 08:29:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 08:29:47 -0400 Subject: [Bioperl-l] FYI: note on wiki template behavior Message-ID: <62B28D02BEA44E13BBDB5531FF6D67CF@NewLife> Wiki-interested folks- I fixed a "feature" in the HOWTO template-- When the template was used twice in the same line of text, the text following the first instance was rendered as a "code box". This had to do with how the template itself was formatted. If you're interested, please have a look at http://www.bioperl.org/wiki/Template_talk:HOWTO cheers, Mark From tristan.lefebure at gmail.com Thu Apr 2 09:30:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 2 Apr 2009 09:30:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks you both, To internally store the ST composition, so that I can reuse it in the same script, I made the following modifications to SimpleAlign.pm: diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm 590a591,592 > #modified to also returned an array of the ST composition > my %st; 651a654 > push @{$st{$order{$str}}}, $_->id(); 655c658 < return $aln; --- > return ($aln, %st); This is probably not really BioPerl compliant. Being an OBO ignorant, I wonder if we could add this information somewhere either once in the $aln object, or by little pieces in each Bio::LocatableSeq objects? Thks, --Tristan On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > Thanks Weigang-- I didn't look carefully enough-- > I'll make a change to the POD. > so Tristan, in your code below, add > > $aln->verbose(1); > > before you invoke uniq_seq(). The ST's should > then be sent to stderr (as "warns"). > > MAJ > ----- Original Message ----- From: "Weigang Qiu" > To: "Mark A. Jensen" > Cc: "BioPerl List" ; < > tristan.lefebure at gmail.com> > Sent: Wednesday, April 01, 2009 11:57 PM > Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Mark and Tristan, >> >> I am the original instigator of the uniq_seq method. The STDERR >> implementation was used so that STDOUT could be piped. But it did not >> conform to bioperl convention of using the $self->debug() method. I think >> that's why these lines were commented out and re-implemented using the >> $self->debug method. So, turning on the debug option should give the >> intended ST mapping for each sequence in stderr. >> >> weigang >> >> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >> wrote: >> >> Tristan-- >>> Strange: it looks like the prints to stderr have been commented out in >>> the >>> source (back in revision 10242; 1.6 is rev 15582). The >>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>> you can >>> uncomment them to work around this. >>> You are right, this is rather an unconventional way to specify an output >>> option-- can Chris comment? >>> Mark >>> ----- Original Message ----- From: "Tristan Lefebure" < >>> tristan.lefebure at gmail.com> >>> To: "BioPerl List" >>> Sent: Wednesday, April 01, 2009 11:11 PM >>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>> >>> >>> >>> Hi there, >>> >>>> >>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>> module. >>>> Here is the description: >>>> >>>> Title : uniq_seq >>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>> in the alignment. Ambiguous base ("N", "n") and >>>> leading and ending gaps ("-") are NOT counted as >>>> differences. >>>> Function : Make a new alignment of unique sequence types (STs) >>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>> "ST") >>>> 2. ST of each sequence in STDERR >>>> Argument : None >>>> >>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>> to >>>> go >>>> to STDERR), but I see nothing... >>>> >>>> An example: >>>> >>>> --------test.fasta: >>>> >>>> seq1 >>>>> >>>>> AAATTTC >>>> >>>> seq2 >>>>> >>>>> CAATTTC >>>> >>>> seq3 >>>>> >>>>> AAATTTC >>>> ------- >>>> >>>> >>>> ----------test.pl: >>>> #! /usr/bin/perl >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::AlignIO; >>>> use Bio::SimpleAlign; >>>> use Getopt::Long; >>>> >>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>> -format => 'fasta'); >>>> >>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>> -format => 'fasta'); >>>> >>>> while ( my $aln = $in->next_aln() ) { >>>> my $red_aln = $aln->uniq_seq; >>>> $out->write_aln($red_aln); >>>> } >>>> ------------- >>>> >>>> If you run: >>>> >>>> ./test.pl &> log >>>> >>>> you will get nothing written into the log file... (but the test.out is >>>> OK) >>>> >>>> Am I missing something? >>>> By the way, wouldn't it be more convenient to have the ST composition >>>> returned >>>> in an array? >>>> >>>> Thanks, >>>> >>>> --Tristan >>>> (BioPerl 1.6) >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Weigang Qiu >> Department of Biological Sciences >> Hunter College, City University of New York >> 695 Park Avenue >> New York, NY 10065 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> From dereje1227 at yahoo.com Thu Apr 2 09:45:08 2009 From: dereje1227 at yahoo.com (demis001) Date: Thu, 2 Apr 2009 06:45:08 -0700 (PDT) Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 Message-ID: <22816585.post@talk.nabble.com> Hi , I am new to BioPerl and this forum and even do not know how to post the new post. I have one question for you guys. Is there any BioPerl module that allows me to download sequence based on chromosome name, seqStart and SeqEnd given the formatted human genome database downloaded on my Linux desktop? I used to do this using Perl $URI object and it is really slow as the process depend on the network. To be more specific, I took chrName, seqStart and seqEnd and go to Ensembl database to get the sequence one by one using Perl $URI object. I thought it might be easier if I process locally using indexed database using BioPerl module if there is any designed for this purpose. Input, millions rows of tab delimited (CSV) file contain information about chrName, seqStart, seqEnd. Locally formatted/indexed human genome. Output should be the fasta sequence contain the sequence and with the header contain chr name and location persed Sorry if I posted in the wrong section of the forum and happy to get any recommendation. Thanks Govind Chandra wrote: > > Hi, > > The code below > > > ====== code begins ======= > #use strict; > use Bio::SeqIO; > > $infile='NC_000913.gbk'; > my $seqio=Bio::SeqIO->new(-file => $infile); > my $seqobj=$seqio->next_seq(); > my @features=$seqobj->all_SeqFeatures(); > my $count=0; > foreach my $feature (@features) { > unless($feature->primary_tag() eq 'CDS') {next;} > print($feature->start()," ", $feature->end(), " > ",$feature->strand(),"\n"); > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); > @temp2=$ac->get_Annotations(); > print("$temp1 $temp2[0] @temp2\n"); > if($count++ > 5) {last;} > } > > print(ref($ac),"\n"); > exit; > > ======= code ends ======== > > produces the output > > ========== output begins ======== > > 190 255 1 > 0 > 337 2799 1 > 0 > 2801 3733 1 > 0 > 3734 5020 1 > 0 > 5234 5530 1 > 0 > 5683 6459 -1 > 0 > 6529 7959 -1 > 0 > Bio::Annotation::Collection > > =========== output ends ========== > > $ac is-a Bio::Annotation::Collection but does not actually contain any > annotation from the feature. Is this how it should be? I cannot figure > out what is wrong with the script. Earlier I used to use has_tag(), > get_tag_values() etc. but the documentation says these are deprecated. > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > -a is > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Thanks in advance for any help. > > Govind > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Thu Apr 2 09:46:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 09:46:36 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife><7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Hi Tristan-- I think this is a good thought, Can you register this as an enhancement at http://bugzilla.bioperl.org ? Please go ahead and attach the diff as a patch to the 'bug' report-- thanks for *your* input- cheers, Mark ----- Original Message ----- From: "Tristan Lefebure" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Weigang Qiu" Sent: Thursday, April 02, 2009 9:30 AM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Thanks you both, > > To internally store the ST composition, so that I can reuse it in the same > script, I made the following modifications to SimpleAlign.pm: > > diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm > /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm > 590a591,592 >> #modified to also returned an array of the ST composition >> my %st; > 651a654 >> push @{$st{$order{$str}}}, $_->id(); > 655c658 > < return $aln; > --- >> return ($aln, %st); > > This is probably not really BioPerl compliant. Being an OBO ignorant, I > wonder if we could add this information somewhere either once in the $aln > object, or by little pieces in each Bio::LocatableSeq objects? > > Thks, > > --Tristan > > On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > >> Thanks Weigang-- I didn't look carefully enough-- >> I'll make a change to the POD. >> so Tristan, in your code below, add >> >> $aln->verbose(1); >> >> before you invoke uniq_seq(). The ST's should >> then be sent to stderr (as "warns"). >> >> MAJ >> ----- Original Message ----- From: "Weigang Qiu" >> To: "Mark A. Jensen" >> Cc: "BioPerl List" ; < >> tristan.lefebure at gmail.com> >> Sent: Wednesday, April 01, 2009 11:57 PM >> Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Mark and Tristan, >>> >>> I am the original instigator of the uniq_seq method. The STDERR >>> implementation was used so that STDOUT could be piped. But it did not >>> conform to bioperl convention of using the $self->debug() method. I think >>> that's why these lines were commented out and re-implemented using the >>> $self->debug method. So, turning on the debug option should give the >>> intended ST mapping for each sequence in stderr. >>> >>> weigang >>> >>> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >>> wrote: >>> >>> Tristan-- >>>> Strange: it looks like the prints to stderr have been commented out in >>>> the >>>> source (back in revision 10242; 1.6 is rev 15582). The >>>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>>> you can >>>> uncomment them to work around this. >>>> You are right, this is rather an unconventional way to specify an output >>>> option-- can Chris comment? >>>> Mark >>>> ----- Original Message ----- From: "Tristan Lefebure" < >>>> tristan.lefebure at gmail.com> >>>> To: "BioPerl List" >>>> Sent: Wednesday, April 01, 2009 11:11 PM >>>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>>> >>>> >>>> >>>> Hi there, >>>> >>>>> >>>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>>> module. >>>>> Here is the description: >>>>> >>>>> Title : uniq_seq >>>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>>> in the alignment. Ambiguous base ("N", "n") and >>>>> leading and ending gaps ("-") are NOT counted as >>>>> differences. >>>>> Function : Make a new alignment of unique sequence types (STs) >>>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>>> "ST") >>>>> 2. ST of each sequence in STDERR >>>>> Argument : None >>>>> >>>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>>> to >>>>> go >>>>> to STDERR), but I see nothing... >>>>> >>>>> An example: >>>>> >>>>> --------test.fasta: >>>>> >>>>> seq1 >>>>>> >>>>>> AAATTTC >>>>> >>>>> seq2 >>>>>> >>>>>> CAATTTC >>>>> >>>>> seq3 >>>>>> >>>>>> AAATTTC >>>>> ------- >>>>> >>>>> >>>>> ----------test.pl: >>>>> #! /usr/bin/perl >>>>> >>>>> use strict; >>>>> use warnings; >>>>> use Bio::AlignIO; >>>>> use Bio::SimpleAlign; >>>>> use Getopt::Long; >>>>> >>>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>>> -format => 'fasta'); >>>>> >>>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>>> -format => 'fasta'); >>>>> >>>>> while ( my $aln = $in->next_aln() ) { >>>>> my $red_aln = $aln->uniq_seq; >>>>> $out->write_aln($red_aln); >>>>> } >>>>> ------------- >>>>> >>>>> If you run: >>>>> >>>>> ./test.pl &> log >>>>> >>>>> you will get nothing written into the log file... (but the test.out is >>>>> OK) >>>>> >>>>> Am I missing something? >>>>> By the way, wouldn't it be more convenient to have the ST composition >>>>> returned >>>>> in an array? >>>>> >>>>> Thanks, >>>>> >>>>> --Tristan >>>>> (BioPerl 1.6) >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> -- >>> Weigang Qiu >>> Department of Biological Sciences >>> Hunter College, City University of New York >>> 695 Park Avenue >>> New York, NY 10065 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Wed Apr 1 08:00:59 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 01 Apr 2009 13:00:59 +0100 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D3577B.1090409@sendu.me.uk> Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created > custom fasta headers for your blast database. The easiest way to get > the tax_id for your accessions would be to download the gi->tax_id > list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the > blast hits then lookup the tax_id from that hash, I think it should > be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on > tax_id is a separate problem :-) If you grab the taxdump.tar.gz file > from the same site, the nodes.dmp file contained within lists what > division each tax_id belongs to (Bacteria, Invertebrates, Mammals, > Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. From shalabh.sharma7 at gmail.com Thu Apr 2 15:50:58 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 2 Apr 2009 15:50:58 -0400 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D3577B.1090409@sendu.me.uk> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> Message-ID: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala wrote: > Smithies, Russell wrote: > >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. The easiest way to get >> the tax_id for your accessions would be to download the gi->tax_id >> list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If >> you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) If you grab the taxdump.tar.gz file >> from the same site, the nodes.dmp file contained within lists what >> division each tax_id belongs to (Bacteria, Invertebrates, Mammals, >> Phages, Plants, etc) so you can probably work it out from that. >> > > Check out the synopsis for Bio::Taxon > http://doc.bioperl.org/bioperl-live/Bio/Taxon.html > > If the division() function doesn't tell you what you need, you could use > get_lineage_nodes() and check the oldest ancestors to see if its a pro > or euk. > From Russell.Smithies at agresearch.co.nz Thu Apr 2 15:55:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 08:55:06 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> We're here to help - unless it's to do your homework ;-) --Russell From: shalabh sharma [mailto:shalabh.sharma7 at gmail.com] Sent: Friday, 3 April 2009 8:51 a.m. To: Sendu Bala Cc: Smithies, Russell; bioperl-l Subject: Re: [Bioperl-l] taxonomy ID thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala > wrote: Smithies, Russell wrote: The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Thu Apr 2 20:46:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 13:46:39 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> I'm re-formatting some blast output into nice html webpages but am finding $self->end_report() and $self->footer() don't seem to be working. The other methods ($self->start_report, $self->introduction, $self->title) all work fine. Am I doing something wrong or is there a trick to it? Here's some test code: ================================== #!perl -w use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use CGI qw(:standard); my $in = Bio::SearchIO->new(-format => "blast",-file => shift @ARGV, ); my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); $index->start_report( \&my_start_report ); $index->title( \&my_title ); $index->footer(\&my_footer); $index->end_report(\&my_end_report); my $out = Bio::SearchIO->new(-writer => $index, -file => ">blast.htm"); $out->write_result($in->next_result); sub my_start_report{ return h1('this is my header'); } sub my_title{ return h1('this is my title'); } sub my_footer{ my ($self) = @_; return h2('this is a footer'); } sub my_end_report { return h2('this is the end'); } ================================= Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Thu Apr 2 21:09:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 2 Apr 2009 18:09:20 -0700 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> Message-ID: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> looking at the code - it doesn't seem to accept resetting the default value. sub end_report { return "\n\n"; } sub footer { my ($self) = @_; return "
Produced by Bioperl module ".ref($self)." on $DATE
Revision: $Revision
\n" } So just adjusting it to mirror what is happening for title and the rest would be necessary. -jason On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > I'm re-formatting some blast output into nice html webpages but am > finding $self->end_report() and $self->footer() don't seem to be > working. > The other methods ($self->start_report, $self->introduction, $self- > >title) all work fine. > Am I doing something wrong or is there a trick to it? > > Here's some test code: > ================================== > > #!perl -w > > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > use CGI qw(:standard); > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > @ARGV, ); > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > $index->start_report( \&my_start_report ); > $index->title( \&my_title ); > $index->footer(\&my_footer); > $index->end_report(\&my_end_report); > > my $out = Bio::SearchIO->new(-writer => $index, -file => > ">blast.htm"); > > $out->write_result($in->next_result); > > > sub my_start_report{ > return h1('this is my header'); > } > > sub my_title{ > return h1('this is my title'); > } > > sub my_footer{ > my ($self) = @_; > return h2('this is a footer'); > } > > sub my_end_report { > return h2('this is the end'); > } > > ================================= > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Thu Apr 2 22:16:34 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 15:16:34 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEE2E@exchsth.agresearch.co.nz> Not wanting to be picky... But $result_>database_name (for blast results) returns the description of the database rather than just the name. Eg. "hs.fna (Human mRNA Refseqs)" instead of "hs.fna" I've had a hunt but can't see where the code for getting the database_name is. Any ideas? Thanx, --Russell > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > Stajich > Sent: Friday, 3 April 2009 2:09 p.m. > To: Smithies, Russell > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? > > looking at the code - it doesn't seem to accept resetting the default > value. > sub end_report { > return "\n\n"; > } > > sub footer { > my ($self) = @_; > return "
Produced by Bioperl module ".ref($self)." on > $DATE
Revision: $Revision
\n" > > } > > So just adjusting it to mirror what is happening for title and the > rest would be necessary. > > -jason > On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > > > I'm re-formatting some blast output into nice html webpages but am > > finding $self->end_report() and $self->footer() don't seem to be > > working. > > The other methods ($self->start_report, $self->introduction, $self- > > >title) all work fine. > > Am I doing something wrong or is there a trick to it? > > > > Here's some test code: > > ================================== > > > > #!perl -w > > > > use Bio::SearchIO; > > use Bio::SearchIO::Writer::HTMLResultWriter; > > use CGI qw(:standard); > > > > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > > @ARGV, ); > > > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > > > $index->start_report( \&my_start_report ); > > $index->title( \&my_title ); > > $index->footer(\&my_footer); > > $index->end_report(\&my_end_report); > > > > my $out = Bio::SearchIO->new(-writer => $index, -file => > > ">blast.htm"); > > > > $out->write_result($in->next_result); > > > > > > sub my_start_report{ > > return h1('this is my header'); > > } > > > > sub my_title{ > > return h1('this is my title'); > > } > > > > sub my_footer{ > > my ($self) = @_; > > return h2('this is a footer'); > > } > > > > sub my_end_report { > > return h2('this is the end'); > > } > > > > ================================= > > > > Thanx, > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > From bernd.web at gmail.com Fri Apr 3 09:47:23 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 15:47:23 +0200 Subject: [Bioperl-l] AlignIO clustal Message-ID: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Hi, Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a clustalw alignment. At the moment, I cannot update to a newer version, so am not sure this problem still exists. The problem is that the $aln object does not exists when the last sequence in a block contains gaps only. Anybody has seen this or knows a fix? Code and example input follows below. Regards, Bernd use Bio::AlignIO; my $in = Bio::AlignIO->new(-file => 'test.aln', -format => 'clustalw'); my $out = Bio::AlignIO->new(-file => '>testerr.ALN', -format => 'clustalw'); my $aln = $in->next_aln(); print $aln->length, "\n"; test.aln contains: CLUSTAL W(1.81) multiple sequence alignment QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI 7082547/1-128 ---------ERATNDMLIGP--DWAVN 1_3265048/1-0 --------------------------- 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV 1_3265047/1-0 --------------------------- From bernd.web at gmail.com Fri Apr 3 10:11:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 16:11:44 +0200 Subject: [Bioperl-l] AlignIO clustal In-Reply-To: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> References: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Message-ID: <716af09c0904030711l8252943hff489ccb9f720920@mail.gmail.com> Hi, I noticed this issue is not specific to Clustal; it also occurs for Fasta. The "problem" arises in a last check, which is only done on the last sequence; it is still present in the current code (webcvs) in the next_aln code. In fasta.pm: # If $end <= 0, we have either reached the end of # file in <> or we have encountered some other error if ( $end <= 0 ) { undef $aln; return $aln; } In clustalw.pm # not sure if this should be a default option - or we can pass in # an option to do this in the future? --jason stajich # $aln->map_chars('\.','-'); undef $aln if ( !defined $end || $end <= 0 ); return $aln; And the last sequence actually got a zero end. This was given in an $aln->slice where gap only sequences are retained. It will also get a "0" in next_aln itself if no coordinates would be present. 1_3265047/1-0 --------------------------- For now, uncommenting "undef $aln if ( !defined $end || $end <= 0 );" works. Regards, Bernd On Fri, Apr 3, 2009 at 3:47 PM, Bernd Web wrote: > Hi, > > Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a > clustalw alignment. > At the moment, I cannot update to a newer version, so am not sure this > problem still exists. > > The problem is that the $aln object does not exists when the last > sequence in a block contains gaps only. > Anybody has seen this or knows a fix? Code and example input follows below. > > > Regards, > Bernd > > > use Bio::AlignIO; > my $in = Bio::AlignIO->new(-file => 'test.aln', > -format => 'clustalw'); > > my $out = Bio::AlignIO->new(-file => '>testerr.ALN', > -format => 'clustalw'); > > my $aln = $in->next_aln(); > print $aln->length, "\n"; > > test.aln contains: > > CLUSTAL W(1.81) multiple sequence alignment > > > QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI > 7082547/1-128 ---------ERATNDMLIGP--DWAVN > 1_3265048/1-0 --------------------------- > 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV > 1_3265047/1-0 --------------------------- > From hlapp at gmx.net Mon Apr 6 11:39:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 6 Apr 2009 11:39:50 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> (Removing biosql-l from the cc list as this seems to be a problem with BioPerl.) Hi Johann, I don't know whether anyone has responded to you yet - if not I'm sorry, I've been inundated for the past couple test. On Apr 1, 2009, at 6:14 AM, Johann PELLET wrote: > With the latest version of BioPerl and BioSQL, I have tried to > insert entry from a GenBank file, which I have downloaded from the > NCBI website (648 937 records) Could you be more specific? When you say the latest version of BioPerl, do you mean 1.6.1 or the current svn snapshot of the main trunk? And which Genbank file is it? Is it one with only viruses, i.e., are you specifically interested in the virus sequences that the parser is giving you trouble with? > After successfully loading ncbi_taxonomy i am getting following > error message while loading sequences into database. > > perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg - > dbname biosql > > > --------------------- WARNING --------------------- > MSG: The supplied lineage does not start near 'Human papillomavirus > type 2c' (I was supplied 'Human papillomavirus - 2 | > Alphapapillomavirus | Papillomaviridae') This is a problem in the BioPerl genbank parser, or more specifically, in the species parser. I thought though this was fixed in 1.6.1; are you sure you don't have an older version of BioPerl lying around that could accidentally have been used? That said, it only seems to be a warning; did you check how the record ended up in the database and found it to be incomplete or messed up? > the script is not stopped until this entry: S67864 This a later entry, not the same entry that causes the problem above, right? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, > values were ("1","19)","1","3") FKs (41914,) > ERROR: invalid input syntax for integer: "19)" Oops - that's a problem that must originate from the BioPerl feature location parser. The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 Does anyone see why the location parser should have a problem with the first gene feature? It's nested, and has remote location components, but at first sight nothing jumps out at me as extraordinary. Has someone recently changed the location parsing code? If no-one has an immediate idea what could be at work here, this needs investigating. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From torsten.seemann at infotech.monash.edu.au Mon Apr 6 21:05:25 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 7 Apr 2009 11:05:25 +1000 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: > The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > Does anyone see why the location parser should have a problem with the first > gene feature? It's nested, and has remote location components, but at first > sight nothing jumps out at me as extraordinary. Has someone recently changed > the location parsing code? If no-one has an immediate idea what could be at > work here, this needs investigating. I'm not sure if Bioperl handles the order() operator? For those unfamilair with the order() operator: http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 order(location,location, ... location) The elements can be found in the specified order (5' to 3' direction), but nothing is implied about the reasonableness about joining them. --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Mon Apr 6 23:59:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Apr 2009 22:59:14 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 > > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > >> Does anyone see why the location parser should have a problem with >> the first >> gene feature? It's nested, and has remote location components, but >> at first >> sight nothing jumps out at me as extraordinary. Has someone >> recently changed >> the location parsing code? If no-one has an immediate idea what >> could be at >> work here, this needs investigating. The location parsing code was refactored above 3-4 years ago w/o problems. This'll be the first one to crop up. I'll try taking a look at it. > I'm not sure if Bioperl handles the order() operator? > > For those unfamilair with the order() operator: > > http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 > > order(location,location, ... location) > The elements can be found in the specified order (5' to 3' direction), > but nothing is implied about the reasonableness about joining them. > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA It's interesting that the version from eutils differs significantly in the feature table when retrieving 'gb' or 'gbwithparts', the latter resolves the location (see below). Regardless we'll need to make sure this is parseable. .... FEATURES Location/Qualifiers source 1..77 /organism="Ovine respiratory syncytial virus" /mol_type="genomic RNA" /db_xref="taxon:28869" gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) /gene="G" gene 55..>77 /gene="fusion glycoprotein F" chris From cjfields at illinois.edu Tue Apr 7 01:32:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Apr 2009 00:32:52 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> Message-ID: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Fixed in svn now and have added this as a test case (passes all tests in bioperl-live). For some reason this wasn't catching some more complex combinations of operators, mainly those with mixes of order/ join. chris On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: > >>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >> >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> >>> Does anyone see why the location parser should have a problem with >>> the first >>> gene feature? It's nested, and has remote location components, but >>> at first >>> sight nothing jumps out at me as extraordinary. Has someone >>> recently changed >>> the location parsing code? If no-one has an immediate idea what >>> could be at >>> work here, this needs investigating. > > The location parsing code was refactored above 3-4 years ago w/o > problems. This'll be the first one to crop up. I'll try taking a > look at it. > >> I'm not sure if Bioperl handles the order() operator? >> >> For those unfamilair with the order() operator: >> >> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >> >> order(location,location, ... location) >> The elements can be found in the specified order (5' to 3' >> direction), >> but nothing is implied about the reasonableness about joining them. >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA > > It's interesting that the version from eutils differs significantly > in the feature table when retrieving 'gb' or 'gbwithparts', the > latter resolves the location (see below). Regardless we'll need to > make sure this is parseable. > > .... > > FEATURES Location/Qualifiers > source 1..77 > /organism="Ovine respiratory syncytial virus" > /mol_type="genomic RNA" > /db_xref="taxon:28869" > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > /gene="G" > gene 55..>77 > /gene="fusion glycoprotein F" > > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johann.pellet at inserm.fr Tue Apr 7 04:48:56 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Tue, 7 Apr 2009 10:48:56 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <73508372-0C43-4693-8135-45C128A25959@inserm.fr> Thanks all, I will update bioperl-live using svn right now, and I will restart to load sequences into my biosql database. Hilmar, My GenBank file contains only virus sequences. I downloaded it using eutils, (db=nuccore, tool=ebot, rettype=gb ...). Thank you again -- -- Johann Pellet Le 7 avr. 09 ? 07:32, Chris Fields a ?crit : > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Apr 7 13:56:27 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 7 Apr 2009 13:56:27 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Awesome, thanks Chris! $beer_owed++; -hilmar On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From juheymann at yahoo.com Tue Apr 7 14:20:04 2009 From: juheymann at yahoo.com (Jurgen Heymann) Date: Tue, 7 Apr 2009 11:20:04 -0700 (PDT) Subject: [Bioperl-l] restriction site map Message-ID: <237420.97841.qm@web54203.mail.re2.yahoo.com> Hi All: I would like to convert a table (restriction enzyme / position where it cuts in gene of interest) into a graphical representation. What avenues exists for that? Would appreciate your comments. Thank you, Jurgen From wenzhiwang1983 at yahoo.com.cn Tue Apr 7 21:39:59 2009 From: wenzhiwang1983 at yahoo.com.cn (Wen-Zhi WANG) Date: Wed, 8 Apr 2009 09:39:59 +0800 (CST) Subject: [Bioperl-l] Pasing Affymatrix Microarray output Message-ID: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Dear all, ? Recently, I focus on population genomics data outputed by affymatrix microarray system. However, softwares which designed by affy. inc only run in Windows 386 platform. Is there any application can used in Linux? Bio::Affymatrix was not strong enough to get the detailed informaton. ? Thank you a lot. ? Yours, WWZ ___________________________________________________________________ ? Wen-Zhi WANG State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel:??????(86) 871-5198993 Fax:???? (86) 871-5195430 Mobile: 13759114244 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From Russell.Smithies at agresearch.co.nz Tue Apr 7 21:58:54 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 8 Apr 2009 13:58:54 +1200 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABF94C@exchsth.agresearch.co.nz> Have you had a look at Microarray-GeneXplorer http://search.cpan.org/~sherlock/Microarray-GeneXplorer-0.11/ I haven't used it but I'd expect it to be pretty good being from Gavin Sherlock :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Wen-Zhi WANG > Sent: Wednesday, 8 April 2009 1:40 p.m. > To: BioPerl List > Subject: [Bioperl-l] Pasing Affymatrix Microarray output > > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run in > Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > > Thank you a lot. > > Yours, > WWZ > ___________________________________________________________________ > > Wen-Zhi WANG > > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel:??????(86) 871-5198993 > Fax:???? (86) 871-5195430 > Mobile: 13759114244 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sdavis2 at mail.nih.gov Tue Apr 7 22:10:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 7 Apr 2009 22:10:17 -0400 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run > in Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > You may want to look at a non-bioperl solution such as Bioconductor ( http://bioconductor.org). Sean From sac at bioperl.org Wed Apr 8 01:59:49 2009 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 7 Apr 2009 22:59:49 -0700 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> Message-ID: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Check out our Affymetrix Power Tools (APT) package: http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx We distribute binaries for Linux and Mac OSX, as well as source code so you can compile it yourself if you want. Note however that this is written in C++, not Perl. We don't provide SWIG or XS interfaces for direct access via Perl, though this would definitely be doable, if anyone is interested. Probably the easiest approach from Perl would be to simply call the appropriate APT executable through the shell as in: system("/path/to/apt --args ..."); The Perl code can parse the output files and take it from there. Steve On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis wrote: > On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > >> Dear all, >> >> Recently, I focus on population genomics data outputed by affymatrix >> microarray system. However, softwares which designed by affy. inc only run >> in Windows 386 platform. Is there any application can used in Linux? >> Bio::Affymatrix was not strong enough to get the detailed informaton. >> > > You may want to look at a non-bioperl solution such as Bioconductor ( > http://bioconductor.org). > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Wed Apr 8 10:07:17 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 8 Apr 2009 07:07:17 -0700 (PDT) Subject: [Bioperl-l] Access Uniprot detailed information Message-ID: <22951210.post@talk.nabble.com> Hi there, maybe I am not able to read careful enough through the Howto section. But is there a function in BioPerl that retrieves for a given Uniprot Access Code or ID from the Uniprot Database some general annotations like enzymatic activity or literature references? I appreciate any help! -- View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From johann.pellet at inserm.fr Wed Apr 8 11:29:29 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 8 Apr 2009 17:29:29 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Hie all, I confirm that now it's ok for the LOCUS S67862S3 since Chris update. Thanks again. However I still have Warning message with other entries like: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93MP8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93P8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- ######################################################################################################################### but entries are inserted in the biosql database: ######################################################################################################################### biosql=# select * from bioentry where description like 'Hantaanvirus CGRn93P8%'; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+----------+----------- +------------+---------- + ----------------------------------------------------------------------- +--------- 156282 | 84 | 395824 | EF990932 | EF990932 | 156144486 | VRL | Hantaanvirus CGRn93P8 RNA-dependent RNA polymerase gene, partial cds. | 1 156288 | 84 | 395824 | EF990918 | EF990918 | 154623008 | VRL | Hantaanvirus CGRn93P8 segment M, complete sequence. | 1 156294 | 84 | 395824 | EF990904 | EF990904 | 154622980 | VRL | Hantaanvirus CGRn93P8 segment S, complete sequence. | 1 (3 rows) ######################################################################################################################### and finally EU608407 and EU608559 made a crash: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Fowl adenovirus 8' (I was supplied 'Fowl adenovirus E | Aviadenovirus | Adenoviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- #######...14 times ...############ --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008","","","CRC- D35248959C54B9F2","1","1212","") FKs () ERROR: null value in column "location" violates not-null constraint --------------------------------------------------- Could not store EU608559: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children / Library/Perl/5.8.8/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:230 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:237 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 ######################################################################################################################### If I check in the biosql database if some part of this records are inserted: ######################################################################################################################### select * from reference where title='Evidence for positive epistasis in HIV-1'; reference_id | dbxref_id | location | title | authors | crc --------------+-----------+-------------------------------------- +------------------------------------------ + ----------------------------------------------------------------------------+ ---------------------- 16443 | 4179 | Science 306 (5701), 1547-1550 (2004) | Evidence for positive epistasis in HIV-1 | Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. | CRC-19E7AA4FB7A5D4AF (1 row) select * from dbxref where dbxref_id=4179; dbxref_id | dbname | accession | version -----------+--------+-----------+--------- 4179 | PUBMED | 15567861 | 0 select * from bioentry where accession=15567861; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+------+----------- +------------+----------+-------------+--------- (0 rows) ######################################################################################################################### I don't have records with name='EU608407' or 'EU608559' in the bioentry table. Thanks for your help Johann -- -- Johann Pellet Le 7 avr. 09 ? 19:56, Hilmar Lapp a ?crit : > Awesome, thanks Chris! $beer_owed++; > > -hilmar > > On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > >> Fixed in svn now and have added this as a test case (passes all >> tests in bioperl-live). For some reason this wasn't catching some >> more complex combinations of operators, mainly those with mixes of >> order/join. >> >> chris >> >> On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: >> >>> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >>> >>>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/ >>>>> 544772 >>>> >>>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>>> >>>>> Does anyone see why the location parser should have a problem >>>>> with the first >>>>> gene feature? It's nested, and has remote location components, >>>>> but at first >>>>> sight nothing jumps out at me as extraordinary. Has someone >>>>> recently changed >>>>> the location parsing code? If no-one has an immediate idea what >>>>> could be at >>>>> work here, this needs investigating. >>> >>> The location parsing code was refactored above 3-4 years ago w/o >>> problems. This'll be the first one to crop up. I'll try taking a >>> look at it. >>> >>>> I'm not sure if Bioperl handles the order() operator? >>>> >>>> For those unfamilair with the order() operator: >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>>> >>>> order(location,location, ... location) >>>> The elements can be found in the specified order (5' to 3' >>>> direction), >>>> but nothing is implied about the reasonableness about joining them. >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>> >>> It's interesting that the version from eutils differs >>> significantly in the feature table when retrieving 'gb' or >>> 'gbwithparts', the latter resolves the location (see below). >>> Regardless we'll need to make sure this is parseable. >>> >>> .... >>> >>> FEATURES Location/Qualifiers >>> source 1..77 >>> /organism="Ovine respiratory syncytial virus" >>> /mol_type="genomic RNA" >>> /db_xref="taxon:28869" >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> /gene="G" >>> gene 55..>77 >>> /gene="fusion glycoprotein F" >>> >>> >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cgoddard at flmnh.ufl.edu Wed Apr 8 11:25:37 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 11:25:37 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db Message-ID: <49DCC1F1.6080601@flmnh.ufl.edu> I am running into problems when trying to insert a sequence object retrieved from GenBank into a BioSQL schema running in a Postgres database. Whenever I use the 'create()' method on the sequence that has been made into a persistent object, the sequence isn't saved into the database properly. No error messages are given, and the corresponding Postgres primary key sequences are incremented as if the data had been saved properly: the appropriate tables themselves remain empty though. I am completely new to using the biosql-db modules, and so am probably missing something pretty simple. Below you will see the basic code that causes the problem. my $genbank_id = 'AYXXXXXX' my $genDB = new Bio::DB::GenBank; $sequence = $genDB->get_Seq_by_id($genbank_id); my $db = Bio::DB::BioDB->new(-database => 'biosql', -user => 'username', -dbname => 'dbname', -host => 'localhost', -driver => 'Pg'); my $pobj = $db->create_persistent($sequence); $pobj->create(); I am running the latest svn trunk versions of bioperl and bioperl-db (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI taxonomy info using the script included in the BioSQL package, and that data seemed to install without error. Any help or advice would be greatly appreciated. Thanks, Chris Goddard From hlapp at gmx.net Wed Apr 8 12:21:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:21:11 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCC1F1.6080601@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> Message-ID: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> This all sounds like you aren't issuing commit. Are you sure your code contains $popj->commit() and what you are looking at is *after* that is executed? Bioperl-db is transactional, so you decide when to commit (or rollback). -hilmar On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > I am running into problems when trying to insert a sequence object > retrieved from GenBank into a BioSQL schema running in a Postgres > database. Whenever I use the 'create()' method on the sequence that > has been made into a persistent object, the sequence isn't saved > into the database properly. No error messages are given, and the > corresponding Postgres primary key sequences are incremented as if > the data had been saved properly: the appropriate tables themselves > remain empty though. > > I am completely new to using the biosql-db modules, and so am > probably missing something pretty simple. Below you will see the > basic code that causes the problem. > > my $genbank_id = 'AYXXXXXX' > > my $genDB = new Bio::DB::GenBank; > $sequence = $genDB->get_Seq_by_id($genbank_id); > > my $db = Bio::DB::BioDB->new(-database => 'biosql', > -user => 'username', > -dbname => 'dbname', > -host => 'localhost', > -driver => 'Pg'); > > my $pobj = $db->create_persistent($sequence); > $pobj->create(); > > I am running the latest svn trunk versions of bioperl and bioperl-db > (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI > taxonomy info using the script included in the BioSQL package, and > that data seemed to install without error. Any help or advice would > be greatly appreciated. > > Thanks, > Chris Goddard > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 12:40:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:40:53 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCD120.8020302@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> <49DCD120.8020302@flmnh.ufl.edu> Message-ID: <4A6EA2F3-BA88-474E-A9D9-C1A7444CA755@gmx.net> On Apr 8, 2009, at 12:30 PM, Chris Goddard wrote: > That was it. I guess I just incorrectly assumed that create() did > an auto-commit. That was simple to fix. Thank you! > No problem, I'm glad I could be helpful! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cgoddard at flmnh.ufl.edu Wed Apr 8 12:30:24 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 12:30:24 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> Message-ID: <49DCD120.8020302@flmnh.ufl.edu> That was it. I guess I just incorrectly assumed that create() did an auto-commit. That was simple to fix. Thank you! Chris Hilmar Lapp wrote: > This all sounds like you aren't issuing commit. Are you sure your code > contains $popj->commit() and what you are looking at is *after* that > is executed? > > Bioperl-db is transactional, so you decide when to commit (or rollback). > > -hilmar > > On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > >> I am running into problems when trying to insert a sequence object >> retrieved from GenBank into a BioSQL schema running in a Postgres >> database. Whenever I use the 'create()' method on the sequence that >> has been made into a persistent object, the sequence isn't saved into >> the database properly. No error messages are given, and the >> corresponding Postgres primary key sequences are incremented as if >> the data had been saved properly: the appropriate tables themselves >> remain empty though. >> >> I am completely new to using the biosql-db modules, and so am >> probably missing something pretty simple. Below you will see the >> basic code that causes the problem. >> >> my $genbank_id = 'AYXXXXXX' >> >> my $genDB = new Bio::DB::GenBank; >> $sequence = $genDB->get_Seq_by_id($genbank_id); >> >> my $db = Bio::DB::BioDB->new(-database => 'biosql', >> -user => 'username', >> -dbname => 'dbname', >> -host => 'localhost', >> -driver => 'Pg'); >> >> my $pobj = $db->create_persistent($sequence); >> $pobj->create(); >> >> I am running the latest svn trunk versions of bioperl and bioperl-db >> (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI >> taxonomy info using the script included in the BioSQL package, and >> that data seemed to install without error. Any help or advice would >> be greatly appreciated. >> >> Thanks, >> Chris Goddard >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sanjay.harke at gmail.com Wed Apr 8 23:24:45 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Thu, 9 Apr 2009 08:54:45 +0530 Subject: [Bioperl-l] Help in basics of Bioperl Message-ID: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Dear friend, I need help in following problem.I am beginer in bioperl i have sequence data. i install perl-bioperl on my computer. Now i want analyse sequences with blast, tree and multiple sequence analysis. so kindly guide me from basic. sanjay From abhishek.vit at gmail.com Wed Apr 8 23:31:26 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 8 Apr 2009 23:31:26 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: Dear Sanjay As much as people on this love to help out. I would definitely put in some efforts to atleast go through the basic bio perl tutorial before asking this question. Atleast that would have helped you frame the question legitimately. I think we should put diligent effort before trying to take other people's help. Here is the link to bio perl tutorial please try to go through the relevant sections. I am sure you will get your answer there. http://www.bioperl.org/Core/Latest/bptutorial.html Thanks, -Abhi On Wed, Apr 8, 2009 at 11:24 PM, Sanjay Harke wrote: > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Apr 8 23:35:12 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:35:12 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > [...] > and finally EU608407 and EU608559 made a crash: > > [...] > --------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, > attempting to recover > --------------------------------------------------- > #######...14 times ...############ I would assume that you figured out that this was triggered by or affected EU608407? Would you mind sharing how? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., > Whitcomb,LOCUS EU608407 > 1212 bp DNA linear VRL 20-APR-2008","","","CRC- > D35248959C54B9F2","1","1212","") FKs () > ERROR: null value in column "location" violates not-null constraint Is this really the verbatim copy of the error message you saw on the screen? What's really puzzling about this is how the genbank SeqIO parser could mess up parsing the reference entry to badly. Here's the reference from the version online at NCBI: REFERENCE 1 (bases 1 to 1212) AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. TITLE Evidence for positive epistasis in HIV-1 JOURNAL Science 306 (5701), 1547-1550 (2004) PUBMED 15567861 How the first author line would be chopped off at the end and the LOCUS line would have gotten inserted there is a mystery to me. The location is "Science 306 (5701), 1547-1550 (2004)", and according to the error message the parser failed to extract that and the TITLE. Could you confirm that the file you are parsing is not corrupted in any way, specifically for this record? > --------------------------------------------------- > Could not store EU608559: > ------------- EXCEPTION: Bio::Root::Exception ------------- > [...] > > If I check in the biosql database if some part of this records are > inserted: So are there other sequences associated with that PubMed ID? Can you do a grep on the PubMed ID and see whether it occurs already before the one that trips up the load? > [...] > select * from dbxref where dbxref_id=4179; > dbxref_id | dbname | accession | version > -----------+--------+-----------+--------- > 4179 | PUBMED | 15567861 | 0 > > select * from bioentry where accession=15567861; Note that 15567861 is the accession (PubMed ID) for the referenced article, not the sequence. Which bioentries are associated with a reference would be in the bioentry_reference table. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 23:51:52 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:51:52 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > > On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > >> [...] >> and finally EU608407 and EU608559 made a crash: >> >> [...] >> --------------------- WARNING --------------------- >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to recover >> --------------------------------------------------- >> #######...14 times ...############ > > I would assume that you figured out that this was triggered by or > affected EU608407? Would you mind sharing how? Looking at EU608407, it most likely wasn't the culprit or stumbling stone. It must have been triggered before that. > [...] > So are there other sequences associated with that PubMed ID? To answer my own question, it's indeed EU608407 that's from the same PubMed ID, and so am I correct in assuming that you didn't get the exception for that record, which would mean that the reference was properly inserted when that sequence was loaded. The second occurrence of the same PubMed ID should have actually triggered a successful lookup of the previously inserted record, which would then have skipped the insert. The fact that that didn't happen suggests that the PubMed ID also wasn't properly extracted from the Genbank record. So my first suspicion remains that your file is corrupted. Otherwise, if you download this record: http://www.ncbi.nlm.nih.gov/nuccore/183191257 in GenBank format and try to load it alone, it should yield the same error. Can you indeed reproduce the problem in that way? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Apr 8 23:55:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Apr 2009 23:55:12 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: <4FAA64AA47534B98874AB16622D184BA@NewLife> Hi Sanjay, Judging from your posts to the list this month, I see you have an appreciation of the power of Bioperl to help you get all kinds of analysis jobs done, and that you have a real desire to learn a lot about it. I want to encourage that attitude. I also want to remind you that the absolutely best way to really understand anything is to dive into your project and try to understand the basics *on your own*. Your posts to this are honestly much too general for this list. People here are really generous with their time, but they don't have enough of it to walk you through every step. When I have an issue with my Bioperl programming (and believe me, I have had and do have many), I do at least three things before I consider posting on this list: * I read the documentation for the module I'm working with. * I go to the wiki (www.bioperl.org) and look for HOWTOs or tutorials. There is a search facility there, and many many MANY introductory links. * I go to the source code directly, and try to figure out what it is really doing. So, it turns out I rarely post questions to the list, because I've figured out my dumb mistake, or how to do that new thing. PLUS, I've become that much closer to true Bioperl independence. Please go to the page http://www.bioperl.org/wiki/Getting_Started and *read it*. Please follow the links. You may even find that your work has already been done for you. One hint that works here on the list and elsewhere is: the more work you can show you have done by yourself, the more willing an expert is to help you over the hard parts. Conversely, the less work you do, the greater the chance that your questions will go unheard. cheers, Mark ----- Original Message ----- From: "Sanjay Harke" To: Sent: Wednesday, April 08, 2009 11:24 PM Subject: [Bioperl-l] Help in basics of Bioperl > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johann.pellet at inserm.fr Thu Apr 9 05:48:43 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Thu, 9 Apr 2009 11:48:43 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> Message-ID: <2FDD67FF-5DBA-4987-A04D-231AF8B1E93B@inserm.fr> Hie Hilmar, I am very sorry, I checked my GenBank file, and you are right It's corrupted :-( grep EU608407 genbankFile AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008 ACCESSION EU608407 VERSION EU608407.1 GI:183190953 So I have downloaded EU608407 and I have loaded it alone with load_sequence.pl without problems. Same for EU608559. Thanks again Johann Le 9 avr. 09 ? 05:51, Hilmar Lapp a ?crit : > > On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > >> >> On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: >> >>> [...] >>> and finally EU608407 and EU608559 made a crash: >>> >>> [...] >>> --------------------- WARNING --------------------- >>> MSG: Unexpected error in feature table for Skipping feature, >>> attempting to recover >>> --------------------------------------------------- >>> #######...14 times ...############ >> >> I would assume that you figured out that this was triggered by or >> affected EU608407? Would you mind sharing how? > > Looking at EU608407, it most likely wasn't the culprit or stumbling > stone. It must have been triggered before that. > >> [...] >> So are there other sequences associated with that PubMed ID? > > To answer my own question, it's indeed EU608407 that's from the same > PubMed ID, and so am I correct in assuming that you didn't get the > exception for that record, which would mean that the reference was > properly inserted when that sequence was loaded. > > The second occurrence of the same PubMed ID should have actually > triggered a successful lookup of the previously inserted record, > which would then have skipped the insert. The fact that that didn't > happen suggests that the PubMed ID also wasn't properly extracted > from the Genbank record. So my first suspicion remains that your > file is corrupted. > > Otherwise, if you download this record: > http://www.ncbi.nlm.nih.gov/nuccore/183191257 > > in GenBank format and try to load it alone, it should yield the same > error. Can you indeed reproduce the problem in that way? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From montalen at moulon.inra.fr Thu Apr 9 06:49:22 2009 From: montalen at moulon.inra.fr (montalent) Date: Thu, 9 Apr 2009 12:49:22 +0200 Subject: [Bioperl-l] Bioperl add_object_condition Message-ID: <6D76CE64E5E744C7B571F3BA31670F9D@bioinfo2> Dear colleague, I try to use add_object_condition() function, to get a subset of sequences. I try this : # 1. STORE SELECTED BAC IN AN HASH TABLE : key = bac_name, value = sequence # 1.1 STORE SELECTED BAC NAME IN AN ARRAY my @selected_bac_list=(); open (SELECTION, $bac_selection_file) or die "can not open $bac_selection_file :$!\n"; while (my $line=){ my ($bac_name)=($line =~ /^(.+?);.+/); # print $bac_name."\n"; push @selected_bac_list, $bac_name; } # 1.2 READ FASTA FILE WITH BIOPERL TO STORE IN AN HASH TABLE my $bac_fasta= Bio::SeqIO->new(-file=>$maize_sequence_bac_file, '-format'=>"Fasta"); my $builder = $bac_fasta->sequence_builder(); if ($builder->add_object_condition(sub { print " check \n"; my $seq_ref=shift; if ($ref_seq->{'-length'} > 5000;){ return 0;} else {return 1;} })){ print "add_object_condition renvoie true\n";} else{ print "add_object_condition renvoie false\n";} # for each sequence in fasta file, check if it is a selected bac while(my $seq=$bac_fasta->next_seq()){ print $seq->id."\n"; # PB : IT PRINTS ALL THE SEQUENCE NOT THE SUBSET.... } I can't get the sequences subset but all the sequences. So I make a print() in the closure of add_object_condition, but nothing is printed. It seems like it does not execute the sub in add_object_condition(), but add_object_conditions return true value. I try to use add_object_condition who seems to be a powerfull method, but I do not succeed. May I ask you some advice how to use add_object_condition() ? Do I forget something ? Best regards Pierre Montalent INRA - Ferme du moulon France From jarodpardon at yahoo.com.cn Thu Apr 9 20:27:29 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Fri, 10 Apr 2009 08:27:29 +0800 (CST) Subject: [Bioperl-l] bioperl translate() function for seq obj Message-ID: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Hi, all, I want to know whether Bio::PrimarySeqI::translate() uses identical method and codon table with NCBI Blast/blastx does. Thanks. Jarod ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From csembry at ualr.edu Thu Apr 9 20:54:21 2009 From: csembry at ualr.edu (Charles Embry) Date: Thu, 09 Apr 2009 19:54:21 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 Message-ID: Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. I get this error when i run the make file. "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL Writing Makefile for Bio::Ext::Align ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. Must be of the form '#.##'. (For instance '1.23') ?at ./Makefile.PL line 4" This is the first? 11 lines of the Makefile.PL for ext package use Inline::MakeMaker; use Config; WriteInlineMakefile( ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, ???????????????????????? 'Bio::SeqIO::abi' => 0.0, ?????????????????????? }, # e.g., Module::Name => 1.1, ??????????? test??????????????? => { TESTS => 'test.pl' }, ?????????? ); What does the error mean? And what version does it refer to? Of what? (staden?) What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? Thanks you C. Stephen Embry From maj at fortinbras.us Thu Apr 9 21:16:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Apr 2009 21:16:18 -0400 Subject: [Bioperl-l] bioperl translate() function for seq obj In-Reply-To: <221543.32779.qm@web15003.mail.cnb.yahoo.com> References: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Message-ID: Hi Jarod- translate() uses NCBI "Standard" table by default. Check out the POD for PrimarySeqI.pm (where translate is defined). You can specify others by setting -CODONTABLE_ID => $n as an argument to translate(). The codon tables are in Bio::Tools::CodonTable, where the following are defined: @NAMES = #id ( 'Standard', #1 'Vertebrate Mitochondrial',#2 'Yeast Mitochondrial',# 3 'Mold, Protozoan, and CoelenterateMitochondrial and Mycoplasma/Spiroplasma',#4 'Invertebrate Mitochondrial',#5 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6 '', '', 'Echinoderm Mitochondrial',#9 'Euplotid Nuclear',#10 '"Bacterial"',# 11 'Alternative Yeast Nuclear',# 12 'Ascidian Mitochondrial',# 13 'Flatworm Mitochondrial',# 14 'Blepharisma Nuclear',# 15 'Chlorophycean Mitochondrial',# 16 '', '', '', '', 'Trematode Mitochondrial',# 21 'Scenedesmus obliquus Mitochondrial', #22 'Thraustochytrium Mitochondrial' #23 ); Can others (Scott M?) chime in on blast? Mark ----- Original Message ----- From: "?? ??" To: "'bioperl-l'" Sent: Thursday, April 09, 2009 8:27 PM Subject: [Bioperl-l] bioperl translate() function for seq obj > > > Hi, all, > I want to know whether Bio::PrimarySeqI::translate() uses identical method and > codon table with NCBI Blast/blastx does. Thanks. > > Jarod > > > ___________________________________________________________ > ?????????????????????????????????? > http://card.mail.cn.yahoo.com/ > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rrfreimuth2 at yahoo.com Thu Apr 9 22:10:21 2009 From: rrfreimuth2 at yahoo.com (Robert Freimuth) Date: Thu, 9 Apr 2009 19:10:21 -0700 (PDT) Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code Message-ID: <38796.60680.qm@web65611.mail.ac4.yahoo.com> The Perl Foundation is looking for mentors for several projects for Google's Summer of Code.? Two of the projects are directly applicable to bioperl. In particular they're looking for mentors for these projects: Bio::Restriction::* - Improve reading and writing of RE collection in different formats; add support for multicut/multisite enzymes.A bioperl parser module for repeats/transposons."CPAN OS Installer", integrate CPAN packages into Unix package managers like rpm and apt/dpkgCross-platform Perl Bindings for wxWebKit If you're interested please see the full announcement, posted on PerlMonks:? http://www.perlmonks.org/?node_id=755872. Thanks, Bob From j_martin at lbl.gov Thu Apr 9 23:18:28 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 9 Apr 2009 20:18:28 -0700 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: References: Message-ID: <20090410031827.GE6535@eniac.jgi-psf.org> Hello, I found that 1.5.1 a pain to install, I recommend the code from http://www.bioperl.org/wiki/Ext_package#The_latest_code anywho, the read is read.pm, the message is something from inline::c I think, there's an old bug report about it, if you can't use the newer code maybe it will help. http://bugzilla.open-bio.org/show_bug.cgi?id=2074 joel On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: > Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. > I get this error when i run the make file. > > "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. > Must be of the form '#.##'. (For instance '1.23') > ?at ./Makefile.PL line 4" > > This is the first? 11 lines of the Makefile.PL for ext package > > use Inline::MakeMaker; > use Config; > > WriteInlineMakefile( > ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', > ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, > ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, > ???????????????????????? 'Bio::SeqIO::abi' => 0.0, > ?????????????????????? }, # e.g., Module::Name => 1.1, > ??????????? test??????????????? => { TESTS => 'test.pl' }, > ?????????? ); > > What does the error mean? > > And what version does it refer to? Of what? (staden?) > What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? > > > Thanks you > C. Stephen Embry > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hsa_rim at yahoo.co.in Thu Apr 9 23:43:53 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Fri, 10 Apr 2009 09:13:53 +0530 (IST) Subject: [Bioperl-l] Creating Cytoband Ideogram images Message-ID: <824645.66937.qm@web94611.mail.in2.yahoo.com> Hi, I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. Thanks in advance Shafeeq Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ From hlapp at gmx.net Fri Apr 10 00:00:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 10 Apr 2009 00:00:54 -0400 Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code In-Reply-To: <38796.60680.qm@web65611.mail.ac4.yahoo.com> References: <38796.60680.qm@web65611.mail.ac4.yahoo.com> Message-ID: <0C80FD8F-78F6-493E-94C3-AE5D845577C5@gmx.net> Hi Robert - thanks for putting us into the loop! On Apr 9, 2009, at 10:10 PM, Robert Freimuth wrote: > The Perl Foundation is looking for mentors for several projects for > Google's Summer of Code. Two of the projects are directly applicable > to bioperl. > > In particular they're looking for mentors for these projects: > > Bio::Restriction::* - Improve reading and writing of RE collection in > different formats; add support for multicut/multisite enzymes.A > bioperl parser module for repeats/transposons. I don't want to dampen any enthusiasm and the project may indeed be worthwhile, but it's also worth noting that we haven't ever seen the student applicant here (assuming it's the same who contacted Heikki a while ago). Having said that, the fact that there hasn't been any community interaction from the student yet obviously doesn't have to mean that there can't be any in the future. But in the Google Summer of Code spirit of recruiting new contributors into FLOSS communities, it's a less than ideal start. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Apr 10 00:15:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:15:45 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: <20090410031827.GE6535@eniac.jgi-psf.org> References: <20090410031827.GE6535@eniac.jgi-psf.org> Message-ID: <327D2C1C-A61A-473A-B85D-7A249856CC85@illinois.edu> Just to note, we're not actively supporting much of the bioperl-ext code, in favor of the BioLib initiative: http://biolib.open-bio.org/wiki/Main_Page If you do use bioperl-ext I suggest only using the latest code from svn (and that in combination with bioperl-live). chris On Apr 9, 2009, at 10:18 PM, Joel Martin wrote: > Hello, > I found that 1.5.1 a pain to install, I recommend the code from > > http://www.bioperl.org/wiki/Ext_package#The_latest_code > > anywho, the read is read.pm, the message is something from > inline::c I think, there's an old bug report about it, if > you can't use the newer code maybe it will help. > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > joel > > > On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: >> Hello I am a graduate student at UALR and I am trying to install >> the ext package(1.5.1) on bioperl 1.5.1. >> I get this error when i run the make file. >> >> "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL >> Writing Makefile for Bio::Ext::Align >> ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/ >> Bio/SeqIO/staden/Makefile.PL: Invalid version '' for >> Bio::SeqIO::staden::read. >> Must be of the form '#.##'. (For instance '1.23') >> at ./Makefile.PL line 4" >> >> This is the first 11 lines of the Makefile.PL for ext package >> >> use Inline::MakeMaker; >> use Config; >> >> WriteInlineMakefile( >> 'NAME' => 'Bio::SeqIO::staden::read', >> 'VERSION_FROM' => './read.pm', # finds $VERSION, >> 'PREREQ_PM' => { 'Inline::C' => 0.0, >> 'Bio::SeqIO::abi' => 0.0, >> }, # e.g., Module::Name => 1.1, >> test => { TESTS => 'test.pl' }, >> ); >> >> What does the error mean? >> >> And what version does it refer to? Of what? (staden?) >> What version of Staden should this be if i am using the >> io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? >> >> >> Thanks you >> C. Stephen Embry >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 10 00:32:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:32:59 -0500 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Message-ID: <0340305E-EAB3-4A08-9B41-5E706F4A5A16@illinois.edu> Would definitely be worth testing out interactivity with these. chris On Apr 8, 2009, at 12:59 AM, Steve Chervitz wrote: > Check out our Affymetrix Power Tools (APT) package: > > http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx > > We distribute binaries for Linux and Mac OSX, as well as source code > so you can compile it yourself if you want. Note however that this is > written in C++, not Perl. We don't provide SWIG or XS interfaces for > direct access via Perl, though this would definitely be doable, if > anyone is interested. > > Probably the easiest approach from Perl would be to simply call the > appropriate APT executable through the shell as in: > > system("/path/to/apt --args ..."); > > The Perl code can parse the output files and take it from there. > > Steve > > > On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis > wrote: >> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG > >wrote: >> >>> Dear all, >>> >>> Recently, I focus on population genomics data outputed by affymatrix >>> microarray system. However, softwares which designed by affy. inc >>> only run >>> in Windows 386 platform. Is there any application can used in Linux? >>> Bio::Affymatrix was not strong enough to get the detailed >>> informaton. >>> >> >> You may want to look at a non-bioperl solution such as Bioconductor ( >> http://bioconductor.org). >> >> Sean >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 1 17:56:36 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 23:56:36 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = $taxDB? >get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB?>get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB?>get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB?>get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each blast result #Extract the GI my $superkingdom = $taxDB- >get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem with it don't hesitate to contact me. Hope this helps, M; -------------- next part -------------- A non-text attachment was scrubbed... Name: tax2bin2.pl Type: text/x-perl-script Size: 400 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: taxbuild.pm Type: text/x-perl-script Size: 10599 bytes Desc: not available URL: -------------- next part -------------- El 01/04/2009, a las 19:03, Florent Angly escribi?: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that > you won't be able to put its information in a hash (unless you have > a lot of memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you >> created custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >> . >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the >> nodes.dmp file contained within lists what division each tax_id >> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >> so you can probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 >> 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to >>> parse a blast >>> report (refseq blast) and check how may organisms are eukaryotes >>> and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the >>> example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Fri Apr 10 00:34:03 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Apr 2009 23:34:03 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: Estelle, Always direct your questions to the bioperl mail list (I'm cc'ing them now). I'm not sure about using that option, maybe someone else can answer? chris On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > Dear Mr Fields, > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > to run > blastpgp. > However, the -C option (save a checkpoint in ASN.x) seems not > available in > this module (options are -j, -h, -c, -B, and -Q). Is there another > way to > save the checkpoint? > > I thank you by advance (and apologize for my English). > > Estelle From jaleto at gmail.com Fri Apr 10 03:50:46 2009 From: jaleto at gmail.com (Jonathan Leto) Date: Fri, 10 Apr 2009 00:50:46 -0700 Subject: [Bioperl-l] Google Summer of Code 2009 BioPerl Student Applications Message-ID: <9aaadf9c0904100050g7f82f925s2e9bae9646da6cd5@mail.gmail.com> Howdy, There are two student applications for The Perl Foundation this year which are BioPerl-related, and I would very much like for them to succeed, but most of the current mentors do not have the background to judge whether they are possible in the time given, or what most of words mean for that matter. We really need some feedback from BioPerl people as to the viability of this applications, as well as comments and suggestions for implementation issues. Please sign up at the GSoC web app [1], then apply to be a mentor for The Perl Foundation. It requires me to manually accept you and then you will be able to view the 19 applications we received this year. Please also join the private mentor list [2] and the students+mentors list [3] if you would like to keep up to date and get involved. Welcome! Cheers, [1] http://socghop.appspot.com/ [2] http://groups.google.com/group/tpf-gsoc [3] http://groups.google.com/group/tpf-gsoc-students -- [---------------------] Jonathan Leto jaleto at gmail.com From scott at scottcain.net Fri Apr 10 09:08:53 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 10 Apr 2009 09:08:53 -0400 Subject: [Bioperl-l] Creating Cytoband Ideogram images In-Reply-To: <824645.66937.qm@web94611.mail.in2.yahoo.com> References: <824645.66937.qm@web94611.mail.in2.yahoo.com> Message-ID: <536f21b00904100608w23484c5bi3765da39b6b4d946@mail.gmail.com> Hello Shafeeq, You need Bio::Graphics::Glyph::ideogram, which is part of Bio::Graphics. You can install it from cpan and it will install BioPerl 1.6 as a prereq. The perldoc for ideogram.pm has example code and data, since the format of the data is important. Scott On Thu, Apr 9, 2009 at 11:43 PM, shafeeq rim wrote: > Hi, > > I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. > > Thanks in advance > Shafeeq > > > > ? ? ?Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Fri Apr 10 09:32:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 10 Apr 2009 08:32:00 -0500 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: I don't know if this has been pointed out, but Bio::DB::Taxonomy is also capable of indexing and using the NCBI tax flat files. use Bio::DB::Taxonomy; my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namefile); # use other Bio::DB::Taxonomy methods chris On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > You may find the attached Perl module useful. It solves the > difficult parts of getting the taxonomy given a GI identifier or a > taxID. It is designed to be able to process a high number of GIs > very fast and with low memory usage. > > An example of usage would be: > > use taxbuild; > #Build the taxonomyDB > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > # Get the taxonomy given a GI identifier > my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); > > # Get the taxonomy term of a GI identifier at a given level > my $term_at_level = $taxDB? > >get_term_at_level_from_gi("35961124","family"); > > # Get the taxid of a GI identifier > my $taxid = $taxDB?>get_taxid("35961124"); > > # Get the taxonomy given a taxid > my @tax = $taxDB?>get_taxonomy($taxid); > > # Get the taxonomy at a given level given a taxid > my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); > > # Get the level of a given taxonomical name > my $level = $taxDB?>get_level_from_name("Proteobacteria"); > > The "dict file" is a processed version of the gi_taxid file from > taxonomyDB. You can get this file by running the tax2bin2.pl script > also attached: > > $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin > or, if you are working with genes instead of proteins: > $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin > > A possible solution to the original post using this module would be > something like: > > # Initialize the taxonomyDB once. > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > #For each blast result > #Extract the GI > my $superkingdom = $taxDB- > >get_term_at_level_from_gi($gi,"superkingdom"); > if ($superkingdom eq "Bacteria") { > # Do whatever you want > } elsif ($superkingdom eq "Eukaryota") > # Do whatever you want > } > > > The module has been tested mainly in Linux systems, but should run > without problems in Windows and Mac too. If you encounter any > problem with it don't hesitate to contact me. > > Hope this helps, > > M; > > > > > > El 01/04/2009, a las 19:03, Florent Angly escribi?: > >> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >> you won't be able to put its information in a hash (unless you have >> a lot of memory). >> Florent >> >> Smithies, Russell wrote: >>> The taxonomy information isn't in the blast output unless you >>> created custom fasta headers for your blast database. >>> The easiest way to get the tax_id for your accessions would be to >>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>> . >>> If you load that file into a hash, parse the accessions out of the >>> blast hits then lookup the tax_id from that hash, I think it >>> should be fairly fast. >>> Checking which are prokaryotes and which are eukaryotes based on >>> tax_id is a separate problem :-) >>> If you grab the taxdump.tar.gz file from the same site, the >>> nodes.dmp file contained within lists what division each tax_id >>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >>> so you can probably work it out from that. >>> >>> It's not a very BioPerly solution but sometimes just looking up >>> the answer from a file/table/hash is the simplest way. >>> Hope this helps, >>> >>> Russell Smithies >>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>> To: bioperl-l >>>> Subject: [Bioperl-l] taxonomy ID >>>> >>>> Hi All, >>>> I am writing a script, for one of its part i have to >>>> parse a blast >>>> report (refseq blast) and check how may organisms are eukaryotes >>>> and how >>>> namy of them are prokaryotes. >>>> I am using BIO::DB::taxinomy module: >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> But for this i need a taxonomyid (like '33090') given in the >>>> example. >>>> So is it possible to get a taxonomyid from refseq balst report? >>>> If not then how i can deal with this problem? >>>> >>>> i would really appreciate if anyone can help me out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Fri Apr 10 09:42:15 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 10 Apr 2009 09:42:15 -0400 Subject: [Bioperl-l] Query about Bioperl and Mysql In-Reply-To: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> References: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> Message-ID: <264855a00904100642l482deebend6be66b140933c2c@mail.gmail.com> On Sat, Mar 28, 2009 at 8:41 AM, Sanjay Harke wrote: > Dear friends, > > anybody nows about my following problem. > > !) I want to use my own database mysql with Bioperl > > kindly guide for it. > You'll want to look at the perl DBI and DBD::mysql modules. Sean From bosborne11 at verizon.net Fri Apr 10 09:55:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 09:55:00 -0400 Subject: [Bioperl-l] Access Uniprot detailed information In-Reply-To: <22951210.post@talk.nabble.com> References: <22951210.post@talk.nabble.com> Message-ID: <4C3C5234-31F7-4EEF-BBA0-9B912D21F210@verizon.net> Markus, There is some discussion of the structure of "swiss" format files in the Feature-Annotation HOW TO. Have you taken a look at this? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Other_Sequence_File_Formats This section does not explain all the fields in each different format, but it shows you code that you can run that will print out all the annotations and features. You're really asking 2 questions, I think. Have you figured out how to retrieve a sequence? See if this helps you: http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Brian O. On Apr 8, 2009, at 10:07 AM, manni122 wrote: > > Hi there, > maybe I am not able to read careful enough through the Howto section. > But is there a function in BioPerl that retrieves for a given > Uniprot Access > Code or ID from the Uniprot Database some general annotations like > enzymatic > activity or literature references? > I appreciate any help! > -- > View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Apr 10 10:05:06 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 10:05:06 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: <22816585.post@talk.nabble.com> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <22816585.post@talk.nabble.com> Message-ID: Dereje, There's a HOW TO that discusses an approach similar to this (Using local Genbank and Entrez Gene files): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences But the provided script uses Gene ids, not chromosome names. The more general suggestion would be to look at the module Bio::DB::Fasta. Brian O. On Mar 31, 2009, at 6:59 PM, demis001 wrote: > > Hi , > > I am new to BioPerl and this forum and even do not know how to post > the new > post. I have one question for you guys. > > Is there any BioPerl module that allows me to download sequence > based on > chromosome name, seqStart and SeqEnd given the formatted human genome > database downloaded on my Linux desktop? > > I used to do this using Perl $URI object and it is really slow as the > process depend on the network. To be more specific, I took chrName, > seqStart > and seqEnd and go to Ensembl database to get the sequence one by one > using > Perl $URI object. > > I thought it might be easier if I process locally using indexed > database > using BioPerl module if there is any designed for this purpose. > > Input, millions rows of tab delimited (CSV) file contain > information about > chrName, seqStart, seqEnd. Locally formatted/indexed human genome. > Output > should be the fasta sequence contain the sequence and with the header > contain chr name and location persed > > Sorry if I posted in the wrong section of the forum and happy to > get any > recommendation. > Thanks > > Govind Chandra wrote: >> >> Hi, >> >> The code below >> >> >> ====== code begins ======= >> #use strict; >> use Bio::SeqIO; >> >> $infile='NC_000913.gbk'; >> my $seqio=Bio::SeqIO->new(-file => $infile); >> my $seqobj=$seqio->next_seq(); >> my @features=$seqobj->all_SeqFeatures(); >> my $count=0; >> foreach my $feature (@features) { >> unless($feature->primary_tag() eq 'CDS') {next;} >> print($feature->start()," ", $feature->end(), " >> ",$feature->strand(),"\n"); >> $ac=$feature->annotation(); >> $temp1=$ac->get_Annotations("locus_tag"); >> @temp2=$ac->get_Annotations(); >> print("$temp1 $temp2[0] @temp2\n"); >> if($count++ > 5) {last;} >> } >> >> print(ref($ac),"\n"); >> exit; >> >> ======= code ends ======== >> >> produces the output >> >> ========== output begins ======== >> >> 190 255 1 >> 0 >> 337 2799 1 >> 0 >> 2801 3733 1 >> 0 >> 3734 5020 1 >> 0 >> 5234 5530 1 >> 0 >> 5683 6459 -1 >> 0 >> 6529 7959 -1 >> 0 >> Bio::Annotation::Collection >> >> =========== output ends ========== >> >> $ac is-a Bio::Annotation::Collection but does not actually contain >> any >> annotation from the feature. Is this how it should be? I cannot >> figure >> out what is wrong with the script. Earlier I used to use has_tag(), >> get_tag_values() etc. but the documentation says these are >> deprecated. >> >> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >> uname >> -a is >> >> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> x86_64 x86_64 x86_64 GNU/Linux >> >> Thanks in advance for any help. >> >> Govind >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Apr 10 11:51:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 10 Apr 2009 08:51:45 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <6B951DED-0632-451C-86A4-2A215B1CAE6C@bioperl.org> The only difference to the DB::Taxonomy module I can see is we don't specifically have the dictionary part -- for gi -> taxid, but I just do a local DBHash index of that when I need it. -jason On Apr 10, 2009, at 6:32 AM, Chris Fields wrote: > I don't know if this has been pointed out, but Bio::DB::Taxonomy is > also capable of indexing and using the NCBI tax flat files. > > use Bio::DB::Taxonomy; > > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namefile); > > # use other Bio::DB::Taxonomy methods > > chris > > On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > >> You may find the attached Perl module useful. It solves the >> difficult parts of getting the taxonomy given a GI identifier or a >> taxID. It is designed to be able to process a high number of GIs >> very fast and with low memory usage. >> >> An example of usage would be: >> >> use taxbuild; >> #Build the taxonomyDB >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> # Get the taxonomy given a GI identifier >> my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); >> >> # Get the taxonomy term of a GI identifier at a given level >> my $term_at_level = $taxDB? >> >get_term_at_level_from_gi("35961124","family"); >> >> # Get the taxid of a GI identifier >> my $taxid = $taxDB?>get_taxid("35961124"); >> >> # Get the taxonomy given a taxid >> my @tax = $taxDB?>get_taxonomy($taxid); >> >> # Get the taxonomy at a given level given a taxid >> my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); >> >> # Get the level of a given taxonomical name >> my $level = $taxDB?>get_level_from_name("Proteobacteria"); >> >> The "dict file" is a processed version of the gi_taxid file from >> taxonomyDB. You can get this file by running the tax2bin2.pl script >> also attached: >> >> $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin >> or, if you are working with genes instead of proteins: >> $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin >> >> A possible solution to the original post using this module would be >> something like: >> >> # Initialize the taxonomyDB once. >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> #For each blast result >> #Extract the GI >> my $superkingdom = $taxDB- >> >get_term_at_level_from_gi($gi,"superkingdom"); >> if ($superkingdom eq "Bacteria") { >> # Do whatever you want >> } elsif ($superkingdom eq "Eukaryota") >> # Do whatever you want >> } >> >> >> The module has been tested mainly in Linux systems, but should run >> without problems in Windows and Mac too. If you encounter any >> problem with it don't hesitate to contact me. >> >> Hope this helps, >> >> M; >> >> >> >> >> >> El 01/04/2009, a las 19:03, Florent Angly escribi?: >> >>> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >>> you won't be able to put its information in a hash (unless you >>> have a lot of memory). >>> Florent >>> >>> Smithies, Russell wrote: >>>> The taxonomy information isn't in the blast output unless you >>>> created custom fasta headers for your blast database. >>>> The easiest way to get the tax_id for your accessions would be to >>>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>>> . >>>> If you load that file into a hash, parse the accessions out of >>>> the blast hits then lookup the tax_id from that hash, I think it >>>> should be fairly fast. >>>> Checking which are prokaryotes and which are eukaryotes based on >>>> tax_id is a separate problem :-) >>>> If you grab the taxdump.tar.gz file from the same site, the >>>> nodes.dmp file contained within lists what division each tax_id >>>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, >>>> etc) so you can probably work it out from that. >>>> >>>> It's not a very BioPerly solution but sometimes just looking up >>>> the answer from a file/table/hash is the simplest way. >>>> Hope this helps, >>>> >>>> Russell Smithies >>>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>>> To: bioperl-l >>>>> Subject: [Bioperl-l] taxonomy ID >>>>> >>>>> Hi All, >>>>> I am writing a script, for one of its part i have to >>>>> parse a blast >>>>> report (refseq blast) and check how may organisms are eukaryotes >>>>> and how >>>>> namy of them are prokaryotes. >>>>> I am using BIO::DB::taxinomy module: >>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>>> >>>>> But for this i need a taxonomyid (like '33090') given in the >>>>> example. >>>>> So is it possible to get a taxonomyid from refseq balst report? >>>>> If not then how i can deal with this problem? >>>>> >>>>> i would really appreciate if anyone can help me out. >>>>> >>>>> Thanks >>>>> Shalabh >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From SMarkel at accelrys.com Fri Apr 10 12:01:25 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 10 Apr 2009 12:01:25 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729E04A77@exch1-hi.accelrys.net> Estelle, Are you using the most recent version of Bio::Tools::Run::StandAloneNCBIBlast? The available blastpgp parameters are our @BLASTPGP_PARAMS = qw(A B C E F G H I J K L M N O P Q R S T U W X Y Z a b c e f h j k l m q s t u v y z); See line 94. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, 09 April 2009 9:34 PM > To: Estelle Proux > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp > > Estelle, > > Always direct your questions to the bioperl mail list (I'm cc'ing them > now). I'm not sure about using that option, maybe someone else can > answer? > > chris > > On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > > > Dear Mr Fields, > > > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > > to run > > blastpgp. > > However, the -C option (save a checkpoint in ASN.x) seems not > > available in > > this module (options are -j, -h, -c, -B, and -Q). Is there another > > way to > > save the checkpoint? > > > > I thank you by advance (and apologize for my English). > > > > Estelle > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sat Apr 11 09:50:20 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sat, 11 Apr 2009 21:50:20 +0800 (CST) Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Hi, all, I use Bio::SeqIO driver to parse data files. The input data is somewhat buggy, and some of entries are not strict in format. The parser will throw exceptions and halt when meeting these bad entries. However, I want to just skip these entries, not stop there. So how to suppress exceptions? Thanks. Jarod ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From maj at fortinbras.us Sat Apr 11 11:32:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 11 Apr 2009 11:32:16 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: missed the list. ----- Original Message ----- From: "Mark A. Jensen" To: "?? ??" Sent: Saturday, April 11, 2009 10:52 AM Subject: Re: [Bioperl-l] how to suppress Bioperl exceptions > Hey Jarod- > You can try setting the verbosity of the object negative, as > > $seqio->verbose(-1); > > I've found, though, that the warning messages still come through > sometimes. I've gotten control of these using the Error package: > > use Error qw(:try); > > try { > $seqio = Bio::SeqIO->new(-file='my.fas'); > } > catch Error with { > my $e = shift; > # $e->test will contain the message > }; > > Note the lack of ; after the try block, and the > presence thereof after the catch block. > > cheers -Mark > ----- Original Message ----- > From: "?? ??" > To: > Sent: Saturday, April 11, 2009 9:50 AM > Subject: [Bioperl-l] how to suppress Bioperl exceptions > > >> >> Hi, all, >> I use Bio::SeqIO driver to parse data files. The input data is somewhat >> buggy, and some of entries are not strict in format. The parser will throw >> exceptions and halt when meeting these bad entries. However, I want to just >> skip these entries, not stop there. So how to suppress exceptions? >> Thanks. >> >> Jarod >> >> >> >> ___________________________________________________________ >> ?????????????????????????????????? >> http://card.mail.cn.yahoo.com/ >> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Sat Apr 11 11:56:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Apr 2009 11:56:35 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions In-Reply-To: <936515.8386.qm@web15007.mail.cnb.yahoo.com> References: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Message-ID: Hi Jarod, in addition to Mark's response, what you say in your message would mean that corruption is in specific entries of a file and you want to skip those, rather than entire files. If this is true, then you'd have to put the $seq=$seqio->next_seq() call into the try {} block as that'd be the one that would raise the exception. The SeqIO parsers don't generally guarantee though that they will gracefully recover from a parsing error and advance to the next record; I think the genbank parser will do that, but you will definitely want to check that. -hilmar On Apr 11, 2009, at 9:50 AM, ?? ?? wrote: > > Hi, all, > I use Bio::SeqIO driver to parse data files. The input data is > somewhat buggy, and some of entries are not strict in format. The > parser will throw exceptions and halt when meeting these bad > entries. However, I want to just skip these entries, not stop there. > So how to suppress exceptions? > Thanks. > > Jarod > > > > ___________________________________________________________ > ?????????????????????????????????? > http://card.mail.cn.yahoo.com/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From oleksii.nikolaienko at gmail.com Sun Apr 12 07:10:47 2009 From: oleksii.nikolaienko at gmail.com (Oleksii Nikolaienko) Date: Sun, 12 Apr 2009 14:10:47 +0300 Subject: [Bioperl-l] GSoC proposal Message-ID: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Hi all! My name is Oleksii, I`m PhD student and I`d like to receive your comments on my proposal for Google summer of code. It`s called "bioperl-live::Bio::Restriction::* - implementing missing features" and I`m going to: 1) add support for reading and writing different file formats for module Bio::Restriction::IO 2) add support for multicut/multisite enzymes 3) add information on recommended buffers, restriction efficiency, sensitivity to methylation, etc and corresponding new methods 4) update documentation for Bio::Restriction::* modules Thanks in advance for your suggestions. notch From roy.chaudhuri at gmail.com Tue Apr 14 10:54:21 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 14 Apr 2009 15:54:21 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error Message-ID: <49E4A39D.2020909@gmail.com> Hi Mike. I did get that problem solved in the end, thanks to lots of help from Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped cc-ing the mailing list at some point. The last archived message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had the correct solution - the code change was incorporated into the bioperl-ext CVS, and is in the latest version that you can get from SVN (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the problem you must be experiencing a different issue. You should also bear in mind the message Chris Fields sent to the list a few days ago, and have a look at using BioLib instead: > Just to note, we're not actively supporting much of the bioperl-ext > code, in favor of the BioLib initiative: > > http://biolib.open-bio.org/wiki/Main_Page > > If you do use bioperl-ext I suggest only using the latest code from > svn (and that in combination with bioperl-live). > > chris Hope this helps. Roy. Michael Stubbington wrote: > Dear Dr. Chaudhuri, > > I am currently trying to write a bioperl script that parses .abi > sequence files. I am having exactly the same problem as you did when > you posted this enquiry to the bioperl mailing list > http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was > wondering if you ever solved the problem and, if so, can you remember > what you did? I?d be very grateful for any help you can provide. I > can?t find this problem mentioned anywhere else online. > > Thank you for your time. > > > > Mike -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From cjfields at illinois.edu Tue Apr 14 11:20:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 10:20:00 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <49E4A39D.2020909@gmail.com> References: <49E4A39D.2020909@gmail.com> Message-ID: For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I?d be very grateful for any help you can provide. I >> can?t find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 14 14:21:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 13:21:43 -0500 Subject: [Bioperl-l] GSoC proposal In-Reply-To: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> References: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Message-ID: On Apr 12, 2009, at 6:10 AM, Oleksii Nikolaienko wrote: > Hi all! > My name is Oleksii, I`m PhD student and I`d like to receive your > comments on > my proposal for Google summer of code. It`s called > "bioperl-live::Bio::Restriction::* - implementing missing features" > and I`m > going to: > > 1) add support for reading and writing different file formats for > module Bio::Restriction::IO You should specify which formats you intend on working with. It's known that several formats don't carry all data, for instance prototype information, vendors, etc. so that should be documented for end-users. You should probably suggest a workaround for getting at missing data (i.e. a format that carries all info, retrieving prototype data separately, etc). > 2) add support for multicut/multisite enzymes Agreed, though you should be more specific on how you intend to implement this. From the Bio::Restriction::Enzyme documentation the sequence site is supposed to be a Bio::PrimarySeq (though I would probably change that internally so it creates these on the fly from the stored string). Multicut/multisite implies list context return, so it may become an API issue (and using wantarray as a workaround is fraught with problematic API traps that I suggest avoiding if at all possible). > 3) add information on recommended buffers, restriction > efficiency, > sensitivity to methylation, etc and corresponding new methods Much of this should probably be outlined in the corresponding interface class prior to implementation. > 4) update documentation for Bio::Restriction::* modules Yes, completely agree. This should be bumped closer to the top of the priority list (and outlined in the interface classes). > Thanks in advance for your suggestions. > > notch > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l chris From j_martin at lbl.gov Wed Apr 15 02:50:37 2009 From: j_martin at lbl.gov (Joel Martin) Date: Tue, 14 Apr 2009 23:50:37 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <20090415065037.GB1175@eniac.jgi-psf.org> Hello, Do you know where it says io_lib will stop supporting ABI? We use the latest ( 1.11.6 ) for this so I know it does read them and I just checked with one fresh off a sequencer. But I couldn't find an active forum for staden. Thanks, Joel On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: > For ABI files you'll need an older version of io_lib that supports ABI or > the io_lib that comes with the full staden package. Recent versions of > io_lib don't have ABI support built-in anymore. > > chris > > On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > >> Hi Mike. >> >> I did get that problem solved in the end, thanks to lots of help from >> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >> cc-ing the mailing list at some point. The last archived message in the >> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >> the correct solution - the code change was incorporated into the >> bioperl-ext CVS, and is in the latest version that you can get from SVN >> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >> problem you must be experiencing a different issue. >> >> You should also bear in mind the message Chris Fields sent to the list a >> few days ago, and have a look at using BioLib instead: >> >>> Just to note, we're not actively supporting much of the bioperl-ext >>> code, in favor of the BioLib initiative: >>> http://biolib.open-bio.org/wiki/Main_Page >>> If you do use bioperl-ext I suggest only using the latest code from svn >>> (and that in combination with bioperl-live). >> > >>> chris >> >> Hope this helps. >> Roy. >> >> >> >> Michael Stubbington wrote: >>> Dear Dr. Chaudhuri, >>> I am currently trying to write a bioperl script that parses .abi sequence >>> files. I am having exactly the same problem as you did when >>> you posted this enquiry to the bioperl mailing list >>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>> wondering if you ever solved the problem and, if so, can you remember >>> what you did? I?d be very grateful for any help you can provide. I >>> can?t find this problem mentioned anywhere else online. >>> Thank you for your time. >>> Mike >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Apr 15 08:26:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Apr 2009 07:26:15 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <20090415065037.GB1175@eniac.jgi-psf.org> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> Message-ID: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Joel, They haven't stopped supporting it. IIRC the separate io_lib distribution no longer has the ABI headers, but the io_lib with the full staden package does (a little confusing, yes). I have 1.11.6 and ABI-related tests for bioperl and bioperl-ext don't pass, but compiling with an earlier version does work. It may be as simple as including the header files from an old version, but I haven't tried that. chris On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > Hello, > Do you know where it says io_lib will stop supporting ABI? We use > the latest ( 1.11.6 ) for this so I know it does read them and I just > checked with one fresh off a sequencer. But I couldn't find an active > forum for staden. > > Thanks, > Joel > > On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >> For ABI files you'll need an older version of io_lib that supports >> ABI or >> the io_lib that comes with the full staden package. Recent >> versions of >> io_lib don't have ABI support built-in anymore. >> >> chris >> >> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >> >>> Hi Mike. >>> >>> I did get that problem solved in the end, thanks to lots of help >>> from >>> Aaron Mackey. Looking at the bioperl-l archives it seems like we >>> stopped >>> cc-ing the mailing list at some point. The last archived message >>> in the >>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/ >>> 018925.html) had >>> the correct solution - the code change was incorporated into the >>> bioperl-ext CVS, and is in the latest version that you can get >>> from SVN >>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't >>> solve the >>> problem you must be experiencing a different issue. >>> >>> You should also bear in mind the message Chris Fields sent to the >>> list a >>> few days ago, and have a look at using BioLib instead: >>> >>>> Just to note, we're not actively supporting much of the bioperl-ext >>>> code, in favor of the BioLib initiative: >>>> http://biolib.open-bio.org/wiki/Main_Page >>>> If you do use bioperl-ext I suggest only using the latest code >>>> from svn >>>> (and that in combination with bioperl-live). >>>> >>>> chris >>> >>> Hope this helps. >>> Roy. >>> >>> >>> >>> Michael Stubbington wrote: >>>> Dear Dr. Chaudhuri, >>>> I am currently trying to write a bioperl script that parses .abi >>>> sequence >>>> files. I am having exactly the same problem as you did when >>>> you posted this enquiry to the bioperl mailing list >>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>> wondering if you ever solved the problem and, if so, can you >>>> remember >>>> what you did? I?d be very grateful for any help you can provide. I >>>> can?t find this problem mentioned anywhere else online. >>>> Thank you for your time. >>>> Mike >>> >>> -- >>> Dr. Roy Chaudhuri >>> Department of Veterinary Medicine >>> University of Cambridge, U.K. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Michael.Stubbington at hpa.org.uk Wed Apr 15 03:43:39 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Wed, 15 Apr 2009 08:43:39 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <335635A922FA2B43B35B6ADD7929CC590171550C@porhpaexc001.HPA.org.uk> Thanks a lot for your help. I finally solved the problem with a combination of: 1) Checking out the latest bioperl-ext from svn. 2) A fresh install of an earlier version of io_lib (8.12) 3) Changing to "config.h" in os.h Everything seems to be working now. Best wishes, Mike -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: 14 April 2009 16:20 To: Roy Chaudhuri Cc: Michael Stubbington; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Bio::SeqIO::staden::read make test error For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I'd be very grateful for any help you can provide. I >> can't find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From cjfields1 at gmail.com Mon Apr 20 12:12:10 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Mon, 20 Apr 2009 11:12:10 -0500 Subject: [Bioperl-l] BioPerl 1.6.1 slate Message-ID: <58CCB0F1-9BC8-4437-8870-3D6CAA7BB1ED@gmail.com> All, Just to note, the bioperl 1.6.1 release will probably be delayed until mid-May (just been too busy to work on it, end-of-semester crunch and all). I'll probably release an alpha prior to that (maybe first week of May) for testing some bug fixes across platforms. cheers! chris From nagel at moldiag.de Tue Apr 21 10:31:29 2009 From: nagel at moldiag.de (Mato Nagel) Date: Tue, 21 Apr 2009 16:31:29 +0200 Subject: [Bioperl-l] Exact codon numbering Message-ID: <49EDD8C1.7000101@moldiag.de> Dear colleagues, I spend this evening browsing all your information but didn't succeed in finding a module that translates feature data (CDS and mRNA) into codon numbering. I developed a routine that from an NCBI xml-file creates a structure $exonstructure =[ splice_variant_1->[exon_1->{mRNA_from ->'1', mRNA_to->'something', cDNA_from->'something', cDNA_to->'something', CDS_from->'something', CDS_to->'something', } exon_2->{...} ... ] splice_variant_2 [... ] ] I wonder if it is worth publishing this routine in BioPerl. Looking forward to receiving an answer. Sincerely Yours Mato Nagel From dan.bolser at gmail.com Wed Apr 22 06:49:42 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 11:49:42 +0100 Subject: [Bioperl-l] Creating a fastq format file? Message-ID: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Creating a fastq format file from fasta and 'fasta quality file'? Hi, I'm sure this is easy, but I'm still not able to 'think bioperl'... I have a 'fasta quality file' and a fasta file, and I would like to output a fastq file. I followed the discussion on the previous thread here: http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html With the conclusion seeming to be 'just do it'. Could someone point me at a way to do this, or was that suggestion an error? i.e. the poster was working out a way to create a fastq the only way possible... I get the feeling that this should be a one-liner, but perhaps the above thread was demonstrating the code I need to copy. Thanks for any suggestions, Dan. From drummike at gmail.com Wed Apr 22 08:28:08 2009 From: drummike at gmail.com (Mike Williams) Date: Wed, 22 Apr 2009 08:28:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > Creating a fastq format file from fasta and 'fasta quality file'? > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? Hi there. You should take a look at the documentation for formatdb, that will get you there. http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Mike From dan.bolser at gmail.com Wed Apr 22 09:10:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:10:14 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220610m7ef63a63m8590956d32d57d17@mail.gmail.com> 2009/4/22 Mike Williams : > On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? > > > Hi there. ?You should take a look at the documentation for formatdb, that > will get you there. > > http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Really? I don't find the word fastq anywhere in that file... I know the fastq format isn't that complex, but why write my own custom conversion utility if one already exists right? Bioperl is so good at converting between other formats, I just assumed there should be a couple of lines to get this done. Cheers, Dan. -- Talk live to HOT bioperl developers in your area NOW!! irc://irc.freenode.net/#bioperl > Mike > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Wed Apr 22 09:32:15 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:32:15 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> In the Bio::SeqIO::fastq page: http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq I read: "This object can transform Bio::Seq and Bio::Seq::Quality objects to and from fastq flat file databases." I'm not sure how to code the link between the fastq IO object and the qual object that I have created using the code from the previous thread... Any suggestions? What am I missing? 2009/4/22 Dan Bolser : > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > From dan.bolser at gmail.com Wed Apr 22 09:36:03 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:36:03 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <892884AD17FA42DA96BA586AEAE2170E@NewLife> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <892884AD17FA42DA96BA586AEAE2170E@NewLife> Message-ID: <2c8757af0904220636q6ad96152p63405e03bbe85e6f@mail.gmail.com> Cheers Mark - I was having difficulty understanding that module... I should read more and post less ;-) I got it figured out now... Here is my working code, based on the example kindly posted by Phillip San Miguel #!/usr/bin/perl -w use warnings; use strict; use Bio::SeqIO; use Bio::Seq::Quality; my ($seq_infile,$qual_infile) =(scalar @ARGV == 1) ?($ARGV[0] ,"$ARGV[0].qual") :@ARGV; #Create input objects for both a seq (fasta) and qual file my $in_seq_obj = Bio::SeqIO->new( -file => $seq_infile, -format => 'fasta', ); my $in_qual_obj = Bio::SeqIO->new( -file => $qual_infile, -format => 'qual', ); my $out_fastq_obj = Bio::SeqIO->new( -format => 'fastq' ); while (1){ ## create objects for both a seq and its associated qual my $seq_obj = $in_seq_obj->next_seq || last; my $qual_obj = $in_qual_obj->next_seq; #use seq and qual object methods feed info for new BSQ object my $bsq_obj = Bio::Seq::Quality->new( -seq => $seq_obj->seq(), -qual => $qual_obj->qual(), ); $out_fastq_obj->write_fastq($bsq_obj); exit; } 2009/4/22 Mark A. Jensen : > Dan- There is a fastq module under Bio::SeqIO. Do something like > > ? ? ? ? use Bio::Seq::Quality; > ? ? ? ? use Bio::SeqIO; > ? ? ? ? ? ? ? ?# from Bio::Seq::Quality synopsis... > ? ? ? ?my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; > ? ? ? ?my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; > > ? ? ? ?my $seq = Bio::Seq::Quality->new > ? ? ? ? ? ?( -qual => $qual, > ? ? ? ? ? ? ?-trace_indices => $trace, > ? ? ? ? ? ? ?-seq => ?'atcgatcgatcg', > ? ? ? ? ? ? ?-id ?=> 'human_id', > ? ? ? ? ? ? ?-accession_number => 'S000012', > ? ? ? ? ? ? ?-verbose => -1 ? # to silence deprecated methods > ? ? ? ?); > ? ? ? # typical Bio::SeqIO call > ? ? ? $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); > ? ? ? $seqio->write_seq($seq); > > Mark > ----- Original Message ----- From: "Dan Bolser" > To: > Sent: Wednesday, April 22, 2009 6:49 AM > Subject: [Bioperl-l] Creating a fastq format file? > > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From maj at fortinbras.us Wed Apr 22 09:33:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 22 Apr 2009 09:33:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <892884AD17FA42DA96BA586AEAE2170E@NewLife> Dan- There is a fastq module under Bio::SeqIO. Do something like use Bio::Seq::Quality; use Bio::SeqIO; # from Bio::Seq::Quality synopsis... my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; my $seq = Bio::Seq::Quality->new ( -qual => $qual, -trace_indices => $trace, -seq => 'atcgatcgatcg', -id => 'human_id', -accession_number => 'S000012', -verbose => -1 # to silence deprecated methods ); # typical Bio::SeqIO call $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); $seqio->write_seq($seq); Mark ----- Original Message ----- From: "Dan Bolser" To: Sent: Wednesday, April 22, 2009 6:49 AM Subject: [Bioperl-l] Creating a fastq format file? > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mmuratet at hudsonalpha.org Wed Apr 22 10:03:57 2009 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Wed, 22 Apr 2009 09:03:57 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> Message-ID: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > In the Bio::SeqIO::fastq page: > > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq > > > I read: > > "This object can transform Bio::Seq and Bio::Seq::Quality objects to > and from fastq flat file databases." > > I'm not sure how to code the link between the fastq IO object and the > qual object that I have created using the code from the previous > thread... > > Any suggestions? What am I missing? Howdy This might be a good place to ask the question: having looked at the fastq.pm page, is the fastq format defined (only) by a "@'" followed by a sequence line and a "+" header followed by a quality line and the two headers have to agree? Now that Illumina is using phred scaling, are 'Sanger' and 'Illumina' versions the same? Thanks Mike > > > > 2009/4/22 Dan Bolser : >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point >> me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Apr 22 09:38:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 22 Apr 2009 09:38:53 -0400 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: Hi Carlos, I am moving your inquiry to the BioPerl list, as the tool is a part of Bioperl-db and uses BioPerl for parsing the ontologies. In your case, the goflat parser in BioPerl seems to balk at the second one of the input files. It may be that the input file is (was?) corrupted, that does happen every once in a while. More likely though is that the goflat parser hasn't kept up with some format changes. Have you tried using the obo format version instead? -hilmar On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > Hi guys > > I'm working with biosql and I try to figure out how to load > ontologies into biosql. > > I've tried > > load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- > host localhost --dbname biosql --namespace "Gene Ontology" --format > goflat --fmtargs "-defs_file,GO.defs" function.ontology > process.ontology component.ontology > > as in the script info but I have an error, > > > ------------------- WARNING --------------------- > MSG: DBLink exists in the dblink of _default > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: format error (file process.ontology) offending line: > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > regulation of angiogenesis ; synonym:down\-regulation of > angiogenesis ; synonym:downregulation of angiogenesis ; > synonym:inhibition of angiogenesis % negative regulation of > developmental process ; GO:0051093 % regulation of angiogenesis ; GO: > 0045765 > > STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ > perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ > Bio/OntologyIO/dagflat.pm:284 > STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ > 5.10.0/Bio/OntologyIO/dagflat.pm:317 > STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ > load_ontology.pl:604 > ------------------------------------- > > Any suggestion? > > Cheers, > > Carlos > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Apr 22 10:50:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Apr 2009 09:50:47 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: On Apr 22, 2009, at 9:03 AM, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and > the two headers have to agree? Now that Illumina is using phred > scaling, are 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike I think that's how it is defined, but I remember a while ago finding a formal definition of the format was a bit difficult. Looks like that has been rectified: http://maq.sourceforge.net/fastq.shtml If the parser doesn't read Illumina FASTQ format feel free to post a bug report with some example data. I'm sure this will be needed functionality in the future (and it shouldn't be too hard to add in). chris From hans-rudolf.hotz at fmi.ch Wed Apr 22 10:58:21 2009 From: hans-rudolf.hotz at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 22 Apr 2009 16:58:21 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and the > two headers have to agree? Now that Illumina is using phred scaling, > are 'Sanger' and 'Illumina' versions the same? No, see: http://maq.sourceforge.net/fastq.shtml Regards, Hans > > Thanks > > Mike From j_martin at lbl.gov Wed Apr 22 11:58:15 2009 From: j_martin at lbl.gov (Joel Martin) Date: Wed, 22 Apr 2009 08:58:15 -0700 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: <20090422155815.GA14402@eniac.jgi-psf.org> On Wed, Apr 22, 2009 at 09:03:57AM -0500, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed by a > sequence line and a "+" header followed by a quality line and the two > headers have to agree? Now that Illumina is using phred scaling, are > 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike No they aren't the same, Illumina still encodes the ascii as value + 64 and Sanger as value + 33. Joel From j_martin at lbl.gov Thu Apr 23 05:32:08 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 23 Apr 2009 02:32:08 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Message-ID: <20090423093208.GB22615@eniac.jgi-psf.org> Hello, Maybe they put the headers back in the separate distribution, they seem to be there now. ls -l io_lib-1.11.6/io_lib/abi.h 4 -rw-r--r-- 1 me mypeeps 793 Dec 10 06:54 io_lib-1.11.6/io_lib/abi.h And I can get the ABI-tests to pass with the bioperl-ext on linux, though it takes some odd contortions of the Makefile to get it to compile here. [snip] # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 1 wallclock secs ( 0.95 cusr + 0.06 csys = 1.01 CPU) I might find time to take a shot at getting it to compile cleanerly for linux and solaris, unless you think that's pointless as the BioLib conversion might happen before summer? Joel On Wed, Apr 15, 2009 at 07:26:15AM -0500, Chris Fields wrote: > Joel, > > They haven't stopped supporting it. IIRC the separate io_lib distribution > no longer has the ABI headers, but the io_lib with the full staden package > does (a little confusing, yes). I have 1.11.6 and ABI-related tests for > bioperl and bioperl-ext don't pass, but compiling with an earlier version > does work. It may be as simple as including the header files from an old > version, but I haven't tried that. > > chris > > On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > >> Hello, >> Do you know where it says io_lib will stop supporting ABI? We use >> the latest ( 1.11.6 ) for this so I know it does read them and I just >> checked with one fresh off a sequencer. But I couldn't find an active >> forum for staden. >> >> Thanks, >> Joel >> >> On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >>> For ABI files you'll need an older version of io_lib that supports ABI or >>> the io_lib that comes with the full staden package. Recent versions of >>> io_lib don't have ABI support built-in anymore. >>> >>> chris >>> >>> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >>> >>>> Hi Mike. >>>> >>>> I did get that problem solved in the end, thanks to lots of help from >>>> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >>>> cc-ing the mailing list at some point. The last archived message in the >>>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >>>> the correct solution - the code change was incorporated into the >>>> bioperl-ext CVS, and is in the latest version that you can get from SVN >>>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >>>> problem you must be experiencing a different issue. >>>> >>>> You should also bear in mind the message Chris Fields sent to the list a >>>> few days ago, and have a look at using BioLib instead: >>>> >>>>> Just to note, we're not actively supporting much of the bioperl-ext >>>>> code, in favor of the BioLib initiative: >>>>> http://biolib.open-bio.org/wiki/Main_Page >>>>> If you do use bioperl-ext I suggest only using the latest code from >>>>> svn >>>>> (and that in combination with bioperl-live). >>>>> >>>>> chris >>>> >>>> Hope this helps. >>>> Roy. >>>> >>>> >>>> >>>> Michael Stubbington wrote: >>>>> Dear Dr. Chaudhuri, >>>>> I am currently trying to write a bioperl script that parses .abi >>>>> sequence >>>>> files. I am having exactly the same problem as you did when >>>>> you posted this enquiry to the bioperl mailing list >>>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>>> wondering if you ever solved the problem and, if so, can you remember >>>>> what you did? I?d be very grateful for any help you can provide. I >>>>> can?t find this problem mentioned anywhere else online. >>>>> Thank you for your time. >>>>> Mike >>>> >>>> -- >>>> Dr. Roy Chaudhuri >>>> Department of Veterinary Medicine >>>> University of Cambridge, U.K. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Apr 23 11:45:34 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Apr 2009 08:45:34 -0700 Subject: [Bioperl-l] Request concerning BioPerl In-Reply-To: <49F0300C.2060700@moldiag.de> References: <49F0300C.2060700@moldiag.de> Message-ID: Mato- Please ask on the mailing list - there is documention in the perldoc for starters and the rest depends on how you are querying for accessions or using Entrez queries. -jason On Apr 23, 2009, at 2:08 AM, Mato Nagel wrote: > Dear colleagues, > where are the options documented? > > $gb = Bio::DB::GenBank->new(@options) > > Sincerely Yours > Mato Nagel Jason Stajich jason at bioperl.org From dan.bolser at gmail.com Fri Apr 24 11:24:17 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 16:24:17 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? Message-ID: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Hi all, I couldn't find out how to get the 'clear range' from a Bio::Seq::Quality object... Am I looking in the wrong place, or should this method be a part of the Bio::Seq::Quality class? In the latter case I'm on my way to an implementation, but I am not good at navigating the bioperl docs, so I thought I should ask before I take the time to finish that off. Cheers, Dan. From dan.bolser at gmail.com Fri Apr 24 12:20:23 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 17:20:23 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Message-ID: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Its a bit rough and ready, but it does what I need... =head2 get_clear_range Title : get_clear_range Title : subqual Usage : $subobj = $obj->get_clear_range(); $subobj = $obj->get_clear_range(20); Function : Get the clear range using the given quality score as a cutoff or a default value of 13. Returns : a new Bio::Seq::Quality object Args : a minimum quality value, optional, devault = 13 =cut sub get_clear_range { my $self = shift; my $qual = $self->qual; my $minQual = shift || 13; my (@ranges, $rangeFlag); for(my $i=0; $i<@$qual; $i++){ ## Are we currently within a clear range or not? if(defined($rangeFlag)){ ## Did we just leave the clear range? if($qual->[$i]<$minQual){ ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; ## and reset the range flag. $rangeFlag = undef; } ## else nothing changes } else{ ## Did we just enter a clear range? if($qual->[$i]>=$minQual){ ## Better set the range flag! $rangeFlag = $i; } ## else nothing changes } } ## Did we exit the last clear range? if(defined($rangeFlag)){ my $i = scalar(@$qual); ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; } unless(@ranges){ die "There is no clear range... I don't know what to do here!\n"; } print "there are ", scalar(@ranges), " clear ranges\n"; my $sum; map {$sum += $_->[2]} @ranges; print "of ", scalar(@$qual), " bases, there are $sum with ". "quality scores above the given threshold\n"; for (sort {$b->[2] <=> $a->[2]} @ranges){ if($_->[2]/$sum < 0.5){ warn "not so much a clear range as a clear chunk...\n"; } print $_->[2], "\t", $_->[2]/$sum, "\n"; return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, $_->[1]+1), -qual => $self->subqual($_->[0]+1, $_->[1]+1) ); } } Note, for testing I made a package called Bio/Seq/QualityDB.pm (which is a copy of Bio/Seq/Quality.pm that just has the above method added). That is why the 'new Bio::Seq::Quality object' is actually a Bio::Seq::QualityDB object, but other than that it should slot right in (apart from all the debugging output that I spit out). Cheers, Dan. 2009/4/24 Dan Bolser : > Hi all, > > I couldn't find out how to get the 'clear range' from a > Bio::Seq::Quality object... Am I looking in the wrong place, or should > this method be a part of the Bio::Seq::Quality class? > > In the latter case I'm on my way to an implementation, but I am not > good at navigating the bioperl docs, so I thought I should ask before > I take the time to finish that off. > > > Cheers, > Dan. > From cjfields at illinois.edu Fri Apr 24 14:56:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 13:56:34 -0500 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Message-ID: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If possible, tests don't hurt either! chris On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > Its a bit rough and ready, but it does what I need... > > > > > =head2 get_clear_range > > Title : get_clear_range > > Title : subqual > Usage : $subobj = $obj->get_clear_range(); > $subobj = $obj->get_clear_range(20); > Function : Get the clear range using the given quality score as a > cutoff or a default value of 13. > > Returns : a new Bio::Seq::Quality object > Args : a minimum quality value, optional, devault = 13 > > =cut > > sub get_clear_range > { > my $self = shift; > my $qual = $self->qual; > my $minQual = shift || 13; > > my (@ranges, $rangeFlag); > > for(my $i=0; $i<@$qual; $i++){ > ## Are we currently within a clear range or not? > if(defined($rangeFlag)){ > ## Did we just leave the clear range? > if($qual->[$i]<$minQual){ > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > ## and reset the range flag. > $rangeFlag = undef; > } > ## else nothing changes > } > else{ > ## Did we just enter a clear range? > if($qual->[$i]>=$minQual){ > ## Better set the range flag! > $rangeFlag = $i; > } > ## else nothing changes > } > } > ## Did we exit the last clear range? > if(defined($rangeFlag)){ > my $i = scalar(@$qual); > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > } > > unless(@ranges){ > die "There is no clear range... I don't know what to do here!\n"; > } > > print "there are ", scalar(@ranges), " clear ranges\n"; > > my $sum; map {$sum += $_->[2]} @ranges; > > print "of ", scalar(@$qual), " bases, there are $sum with ". > "quality scores above the given threshold\n"; > > for (sort {$b->[2] <=> $a->[2]} @ranges){ > if($_->[2]/$sum < 0.5){ > warn "not so much a clear range as a clear chunk...\n"; > } > print $_->[2], "\t", $_->[2]/$sum, "\n"; > > return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, > $_->[1]+1), > -qual => $self->subqual($_->[0]+1, $_->[1]+1) > ); > } > } > > > > > Note, for testing I made a package called Bio/Seq/QualityDB.pm (which > is a copy of Bio/Seq/Quality.pm that just has the above method added). > That is why the 'new Bio::Seq::Quality object' is actually a > Bio::Seq::QualityDB object, but other than that it should slot right > in (apart from all the debugging output that I spit out). > > > Cheers, > Dan. > > > 2009/4/24 Dan Bolser : >> Hi all, >> >> I couldn't find out how to get the 'clear range' from a >> Bio::Seq::Quality object... Am I looking in the wrong place, or >> should >> this method be a part of the Bio::Seq::Quality class? >> >> In the latter case I'm on my way to an implementation, but I am not >> good at navigating the bioperl docs, so I thought I should ask before >> I take the time to finish that off. >> >> >> Cheers, >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri Apr 24 15:39:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 24 Apr 2009 12:39:53 -0700 Subject: [Bioperl-l] cvs server still up? Message-ID: <49F21589.6060707@cornell.edu> The old bioperl CVS repository is still up: cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl export -rHEAD bioperl-live I had an old script that was cvs exporting a copy of bioperl, and it has been fetching really old copies for a while now. Maybe somebody might want to deactivate that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Apr 24 16:29:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 15:29:22 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F21589.6060707@cornell.edu> References: <49F21589.6060707@cornell.edu> Message-ID: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Not sure what the plans were for the CVS server beyond having it available for all older bioperl releases (pre-1.6). Everything has been moved into the svn server though, so really the cvs server is redundant. Shutting it down might serve the purpose of alerting users to the fact that we no longer use it! Thinking some more about it: it might be present simply b/c other open- bio projects are still using cvs. I can't recall if biopython switched over or not... chris On Apr 24, 2009, at 2:39 PM, Robert Buels wrote: > The old bioperl CVS repository is still up: > cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl > export -rHEAD bioperl-live > > I had an old script that was cvs exporting a copy of bioperl, and it > has been fetching really old copies for a while now. > > Maybe somebody might want to deactivate that? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Fri Apr 24 17:03:27 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 24 Apr 2009 16:03:27 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Message-ID: <49F2291F.7020704@jays.net> Chris Fields wrote: > I can't recall if biopython switched over or not... http://github.com/biopython "Official git mirror of the Biopython CVS repository" Ponder, j From cjfields at illinois.edu Fri Apr 24 18:50:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 17:50:12 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F2291F.7020704@jays.net> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> <49F2291F.7020704@jays.net> Message-ID: <9AC3AF4D-E9FF-4593-A53A-B59438EC2BA4@illinois.edu> Which makes me wonder, is the CVS version actually updated with git commits (and vice versa) or is git the only thing being used? It is listed as a 'mirror', so I'm assuming they somehow sync to/from CVS (ugh). chris On Apr 24, 2009, at 4:03 PM, Jay Hannah wrote: > Chris Fields wrote: >> I can't recall if biopython switched over or not... > > http://github.com/biopython > "Official git mirror of the Biopython CVS repository" > > Ponder, > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Sun Apr 26 01:50:14 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 26 Apr 2009 15:50:14 +1000 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <20090422155815.GA14402@eniac.jgi-psf.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > > This might be a good place to ask the question: having looked at the > > fastq.pm page, is the fastq format defined (only) by a "@'" followed by > a > > sequence line and a "+" header followed by a quality line and the two > > headers have to agree? Now that Illumina is using phred scaling, are > > 'Sanger' and 'Illumina' versions the same? > > No they aren't the same, Illumina still encodes the ascii as value + 64 > and Sanger as value + 33. > Illumina have now CHANGED how they calculate the quality value however in the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, but now they produce Q >= 0 and it is unclear if they start at 69 or 64 now... I have tried to summarise this in a central place: http://en.wikipedia.org/wiki/FASTQ_format Corrections welcome! --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From heikki.lehvaslaiho at gmail.com Mon Apr 27 01:42:03 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 07:42:03 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > I have tried to summarise this in a central place: > http://en.wikipedia.org/wiki/FASTQ_format Torsten, Thanks for putting this together. Very helpful. Do you have a plan of action? Let me propose one for BioPerl. It based on following assumptions: 1. There is multitude of different ways of coding quality values out there. 2. Bio::Seq::Quality is agnostic of any quality value range rules 3. The emerging open standard is the Sanger fastq specification 4. Open source programs use the Sanger fastq specs >From these it follows that: 1. BioPerl should support Sanger fastq standard 1.1. it already does and there are other SeqIO modules for dealing with other non-fastq formats. 2. BioPerl should offer simple ways of converting between quality range rules 2.1. Have a generic method accessible from Bio::Seq::Quality with preset versions of the method for converting between known variants (Sanger fastq and the two Illumina versions) For example: range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) throw if $value < $from_lower or $value > $from_upper return $newvalue range_convert_illumina2fastq(), range_convert_fastq2illumina(), range_convert_fastq2phred(), range_convert_phred2fastq().... (assuming that illumina 1.3 eq phred) 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina qualities into Sanger fastq on the fly 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of quality value range either automatically or be given a keyword parameter indicating the range. 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects a quality value out of range. 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it detects a quality value out of range. 2.2.4. It would be useful but not absolutely necessary for Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina ranges What do you think? -Heikki 2009/4/26 Torsten Seemann : >> > This might be a good place to ask the question: having looked at the >> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >> a >> > sequence line and a "+" header followed by a quality line and the two >> > headers have to agree? Now that Illumina is using phred scaling, are >> > 'Sanger' and 'Illumina' versions the same? >> >> No they aren't the same, Illumina still encodes the ascii as value + 64 >> and Sanger as value + 33. >> > > Illumina have now CHANGED how they calculate the quality value however in > the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, > but now they produce Q >= 0 and it is unclear if they start at 69 or 64 > now... > > I have tried to summarise this in a central place: > > http://en.wikipedia.org/wiki/FASTQ_format > > Corrections welcome! > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 02:42:08 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 08:42:08 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: Dan, It looks like your method does two different things: 1. Returns the longest subsequence above the threshold 2. Analyses the the sequence for the number of ranges the current threshold creates. Why not separate these functions? Lets add a method that sets the threshold and stores it internally as $self->_threshold. Setting it to a new values should trigger emptying all the caches (see below.) Lets have two more public methods: 1. get_clean_range() - optional argument 'threshold' It returns the longest clean subseq. 2. count_clean_ranges() -again optional argument 'threshold' This returns the number of ranges detected. Both methods call first the public method threshold if the argument has been given and then an internal method _find_clean_ranges(). That method calculates all the ranges and stores them internally (as $self->_clean_ranges-> [...]). The number of ranges is also stored (e.g. $self->_number_of ranges).These internal values form the cache that needs to be emptied whenever any of the critical values of the object changes: threshold, quality or seq. Create an internal method $self->_clear_cache, that does that. Now the quality new object does not get created until you call get_clean_range() which accesses the cached values (or creates them if they are not there). This design allows you to have no extra penalty for adding more methods that act on cached values. For example, it might be sensible thing to do at some point to look at all the ranges that are longer than some length. Then you could write in your program: $qual->threshold(10); if ($qual->count_clean_ranges = 1) { my $newqual = $qual->get_clean_range() # do your analysis } elsif ($qual->count_clean_ranges = 0) { # do some reporting and logging } else { # more than one ranges my @quals = $qual->get_all_clean_ranges($min_lenght); # do some more work and possibly select the best one(s) } Yours, -Heikki 2009/4/24 Chris Fields : > You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If > possible, tests don't hurt either! > > chris > > On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > >> Its a bit rough and ready, but it does what I need... >> >> >> >> >> =head2 get_clear_range >> >> Title ? ?: get_clear_range >> >> Title ? ?: subqual >> Usage ? ?: $subobj = $obj->get_clear_range(); >> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >> Function : Get the clear range using the given quality score as a >> ? ? ? ? ? cutoff or a default value of 13. >> >> Returns ?: a new Bio::Seq::Quality object >> Args ? ? : a minimum quality value, optional, devault = 13 >> >> =cut >> >> sub get_clear_range >> { >> ? my $self = shift; >> ? my $qual = $self->qual; >> ? my $minQual = shift || 13; >> >> ? my (@ranges, $rangeFlag); >> >> ? for(my $i=0; $i<@$qual; $i++){ >> ? ? ? ?## Are we currently within a clear range or not? >> ? ? ? ?if(defined($rangeFlag)){ >> ? ? ? ? ? ?## Did we just leave the clear range? >> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >> ? ? ? ? ? ? ? ?## Log the range >> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? ? ? ? ? ? ? ?## and reset the range flag. >> ? ? ? ? ? ? ? ?$rangeFlag = undef; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? ? ? ?else{ >> ? ? ? ? ? ?## Did we just enter a clear range? >> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >> ? ? ? ? ? ? ? ?## Better set the range flag! >> ? ? ? ? ? ? ? ?$rangeFlag = $i; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? } >> ? ## Did we exit the last clear range? >> ? if(defined($rangeFlag)){ >> ? ? ? ?my $i = scalar(@$qual); >> ? ? ? ?## Log the range >> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? } >> >> ? unless(@ranges){ >> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >> ? } >> >> ? print "there are ", scalar(@ranges), " clear ranges\n"; >> >> ? my $sum; map {$sum += $_->[2]} @ranges; >> >> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >> ? ? ? ?"quality scores above the given threshold\n"; >> >> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >> ? ? ? ?if($_->[2]/$sum < 0.5){ >> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >> ? ? ? ?} >> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >> >> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >> $_->[1]+1), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >> $_->[1]+1) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >> ? } >> } >> >> >> >> >> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >> is a copy of Bio/Seq/Quality.pm that just has the above method added). >> That is why the 'new Bio::Seq::Quality object' is actually a >> Bio::Seq::QualityDB object, but other than that it should slot right >> in (apart from all the debugging output that I spit out). >> >> >> Cheers, >> Dan. >> >> >> 2009/4/24 Dan Bolser : >>> >>> Hi all, >>> >>> I couldn't find out how to get the 'clear range' from a >>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>> this method be a part of the Bio::Seq::Quality class? >>> >>> In the latter case I'm on my way to an implementation, but I am not >>> good at navigating the bioperl docs, so I thought I should ask before >>> I take the time to finish that off. >>> >>> >>> Cheers, >>> Dan. >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From dan.bolser at gmail.com Mon Apr 27 04:31:39 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 27 Apr 2009 09:31:39 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method ?_find_clean_ranges(). That > method calculates all the ranges and stores them internally ?(as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form ?the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do ?at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > ?my $newqual = $qual->get_clean_range() > ?# do your analysis > } elsif ($qual->count_clean_ranges = 0) { > ? # do some reporting and logging > } else { ?# more than one ranges > ? my @quals = $qual->get_all_clean_ranges($min_lenght); > ? # do some more work and possibly select the best one(s) > } > > > > Yours, > > ? -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title ? ?: get_clear_range >>> >>> Title ? ?: subqual >>> Usage ? ?: $subobj = $obj->get_clear_range(); >>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> ? ? ? ? ? cutoff or a default value of 13. >>> >>> Returns ?: a new Bio::Seq::Quality object >>> Args ? ? : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> ? my $self = shift; >>> ? my $qual = $self->qual; >>> ? my $minQual = shift || 13; >>> >>> ? my (@ranges, $rangeFlag); >>> >>> ? for(my $i=0; $i<@$qual; $i++){ >>> ? ? ? ?## Are we currently within a clear range or not? >>> ? ? ? ?if(defined($rangeFlag)){ >>> ? ? ? ? ? ?## Did we just leave the clear range? >>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>> ? ? ? ? ? ? ? ?## Log the range >>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? ? ? ? ? ? ? ?## and reset the range flag. >>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? ? ? ?else{ >>> ? ? ? ? ? ?## Did we just enter a clear range? >>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>> ? ? ? ? ? ? ? ?## Better set the range flag! >>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? } >>> ? ## Did we exit the last clear range? >>> ? if(defined($rangeFlag)){ >>> ? ? ? ?my $i = scalar(@$qual); >>> ? ? ? ?## Log the range >>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? } >>> >>> ? unless(@ranges){ >>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>> ? } >>> >>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> ? my $sum; map {$sum += $_->[2]} @ranges; >>> >>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>> ? ? ? ?"quality scores above the given threshold\n"; >>> >>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>> ? ? ? ?} >>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>> $_->[1]+1), >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>> ? } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:38:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:38:40 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: I convinced at least myself to the degree that I wrote the range_convert() method - with plenty of tests. I mention this now so that no-one else need to start thinking through all the edge values. :) I'll contribute it to the code base once there is a consensus of best way forward. -Heikki 2009/4/27 Heikki Lehvaslaiho : >> I have tried to summarise this in a central place: >> http://en.wikipedia.org/wiki/FASTQ_format > > Torsten, > > Thanks for putting this together. Very helpful. > > Do you have a plan of action? ?Let me propose one for BioPerl. It > based on following assumptions: > > 1. There is multitude of different ways of coding quality values out there. > 2. Bio::Seq::Quality is agnostic of any quality value range rules > 3. The emerging open standard is the Sanger fastq specification > 4. Open source programs use the Sanger fastq specs > > > From these it follows that: > > > 1. BioPerl should support Sanger fastq standard > > 1.1. it already does and there are other SeqIO modules for dealing > with other non-fastq formats. > > 2. BioPerl should offer simple ways of converting between quality range rules > > 2.1. Have a generic method accessible from Bio::Seq::Quality with > preset versions of the method for converting between known variants > (Sanger fastq and the two Illumina versions) > > For example: > > range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) > ?throw if $value < $from_lower or $value > $from_upper > ?return $newvalue > > range_convert_illumina2fastq(), range_convert_fastq2illumina(), > range_convert_fastq2phred(), ?range_convert_phred2fastq().... > > (assuming that illumina 1.3 eq phred) > > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina > qualities into Sanger fastq on the fly > > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of > quality value range either automatically or be given a keyword > parameter indicating the range. > > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects > a quality value out of range. > > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it > detects a quality value out of range. > > 2.2.4. It would be useful but not absolutely necessary for > Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina > ranges > > > What do you think? > > ? ?-Heikki > > 2009/4/26 Torsten Seemann : >>> > This might be a good place to ask the question: having looked at the >>> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >>> a >>> > sequence line and a "+" header followed by a quality line and the two >>> > headers have to agree? Now that Illumina is using phred scaling, are >>> > 'Sanger' and 'Illumina' versions the same? >>> >>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>> and Sanger as value + 33. >>> >> >> Illumina have now CHANGED how they calculate the quality value however in >> the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, >> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >> now... >> >> I have tried to summarise this in a central place: >> >> http://en.wikipedia.org/wiki/FASTQ_format >> >> Corrections welcome! >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:41:52 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:41:52 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, I'll take your code and put it into bioperl-live rewritten the way I suggested and add few tests. That should get you started, -Heikki 2009/4/27 Dan Bolser : > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method ?_find_clean_ranges(). That >> method calculates all the ranges and stores them internally ?(as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form ?the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do ?at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> ?my $newqual = $qual->get_clean_range() >> ?# do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> ? # do some reporting and logging >> } else { ?# more than one ranges >> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >> ? # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> ? -Heikki >> >> 2009/4/24 Chris Fields : >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title ? ?: get_clear_range >>>> >>>> Title ? ?: subqual >>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> ? ? ? ? ? cutoff or a default value of 13. >>>> >>>> Returns ?: a new Bio::Seq::Quality object >>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> ? my $self = shift; >>>> ? my $qual = $self->qual; >>>> ? my $minQual = shift || 13; >>>> >>>> ? my (@ranges, $rangeFlag); >>>> >>>> ? for(my $i=0; $i<@$qual; $i++){ >>>> ? ? ? ?## Are we currently within a clear range or not? >>>> ? ? ? ?if(defined($rangeFlag)){ >>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>> ? ? ? ? ? ? ? ?## Log the range >>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? ? ? ?else{ >>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? } >>>> ? ## Did we exit the last clear range? >>>> ? if(defined($rangeFlag)){ >>>> ? ? ? ?my $i = scalar(@$qual); >>>> ? ? ? ?## Log the range >>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? } >>>> >>>> ? unless(@ranges){ >>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>> ? } >>>> >>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> ? ? ? ?"quality scores above the given threshold\n"; >>>> >>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>> ? ? ? ?} >>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>> $_->[1]+1), >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>> ? } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 09:10:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 08:10:04 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: This is going within Bio::Seq::Quality, correct? Does Bio::Seq::Quality have a method that indicates what format the quality scores are actually in (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about this is quality scores appear inseparable from their quality format (ranges vary in length, for instance). For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell which quality format it originated from w/o guessing, and similarly could I accurately convert it to another qual format? To me it seems we need something in Bio::Seq::Quality akin to the alphabet() method used for sequence data. chris On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > I convinced at least myself to the degree that I wrote the > range_convert() method - with plenty of tests. I mention this now so > that no-one else need to start thinking through all the edge values. > :) > > I'll contribute it to the code base once there is a consensus of best > way forward. > > -Heikki > > 2009/4/27 Heikki Lehvaslaiho : >>> I have tried to summarise this in a central place: >>> http://en.wikipedia.org/wiki/FASTQ_format >> >> Torsten, >> >> Thanks for putting this together. Very helpful. >> >> Do you have a plan of action? Let me propose one for BioPerl. It >> based on following assumptions: >> >> 1. There is multitude of different ways of coding quality values >> out there. >> 2. Bio::Seq::Quality is agnostic of any quality value range rules >> 3. The emerging open standard is the Sanger fastq specification >> 4. Open source programs use the Sanger fastq specs >> >> >> From these it follows that: >> >> >> 1. BioPerl should support Sanger fastq standard >> >> 1.1. it already does and there are other SeqIO modules for dealing >> with other non-fastq formats. >> >> 2. BioPerl should offer simple ways of converting between quality >> range rules >> >> 2.1. Have a generic method accessible from Bio::Seq::Quality with >> preset versions of the method for converting between known variants >> (Sanger fastq and the two Illumina versions) >> >> For example: >> >> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, >> $value) >> throw if $value < $from_lower or $value > $from_upper >> return $newvalue >> >> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >> range_convert_fastq2phred(), range_convert_phred2fastq().... >> >> (assuming that illumina 1.3 eq phred) >> >> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >> qualities into Sanger fastq on the fly >> >> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream >> of >> quality value range either automatically or be given a keyword >> parameter indicating the range. >> >> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it >> detects >> a quality value out of range. >> >> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >> detects a quality value out of range. >> >> 2.2.4. It would be useful but not absolutely necessary for >> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >> ranges >> >> >> What do you think? >> >> -Heikki >> >> 2009/4/26 Torsten Seemann : >>>>> This might be a good place to ask the question: having looked at >>>>> the >>>>> fastq.pm page, is the fastq format defined (only) by a "@'" >>>>> followed by >>>> a >>>>> sequence line and a "+" header followed by a quality line and >>>>> the two >>>>> headers have to agree? Now that Illumina is using phred scaling, >>>>> are >>>>> 'Sanger' and 'Illumina' versions the same? >>>> >>>> No they aren't the same, Illumina still encodes the ascii as >>>> value + 64 >>>> and Sanger as value + 33. >>>> >>> >>> Illumina have now CHANGED how they calculate the quality value >>> however in >>> the last month or so... Their Q range used to be -5..40 mapped to >>> ASCII 64+, >>> but now they produce Q >= 0 and it is unclear if they start at 69 >>> or 64 >>> now... >>> >>> I have tried to summarise this in a central place: >>> >>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Corrections welcome! >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From markus.liebscher at gmx.de Mon Apr 27 09:51:09 2009 From: markus.liebscher at gmx.de (manni122) Date: Mon, 27 Apr 2009 06:51:09 -0700 (PDT) Subject: [Bioperl-l] Re moteblast using Swissprot Message-ID: <23256705.post@talk.nabble.com> Hi, I want to retrieve the sequence identifier from the remoteblast interface (Bio::Tools::Run::RemoteBlast). With this ID I want to look up annotations stored in the Bio::DB::SwissProt. I am using the example code from the RemoteBlast documentation. If I am using a known sequence as input I get "Can't call method "next_hsp" on an undefined value "? This happens only with swissprot as database - the nr database works fine. The accession code from nr is not accepted from the Bio::DB::SwissProt. Is there something wrong with the database? Here is the code I am using: my $v = 1; my @params = ('-prog' => 'blastp', '-data' => 'nr', '-expect' => '1e-10' ); #swissprot is not working $Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM62'; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); $v = 1; my $r = $factory->submit_blast($proteinaa); print STDERR "Need BLAST Analysis, waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { $factory->remove_rid($rid); $result = $rc->next_result; $hit = $result->next_hit; $hsp = $hit->next_hsp; $idneu = $hit->accession; } } } -- View this message in context: http://www.nabble.com/Remoteblast-using-Swissprot-tp23256705p23256705.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From heikki.lehvaslaiho at gmail.com Mon Apr 27 11:44:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:44:40 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. Test and extend, -Heikki 2009/4/27 Heikki Lehvaslaiho : > Dan, > > I'll take your code and put it into bioperl-live rewritten the way I > suggested and add few tests. > > That should get you started, > > ? -Heikki > > 2009/4/27 Dan Bolser : >> Hi Heikki, >> >> Thanks very much for the advice on how to better implement the clear >> range method within the Bio::Seq::Quality object. I can understand the >> logic of what you have written, and it all sounds reasonable. The only >> problem is that I am very inexperienced with working on object >> oriented Perl (my 'one man' projects to date have never really >> required me to think beyond scripts, and its been years since I >> actually tried to code objects in Perl). >> >> To be specific, when you say, "Lets add a method that sets the >> threshold and stores it internally as $self->_threshold", ignoring any >> other functionality, what would that method look like? in particular, >> how would $self->_threshold be implemented? >> >> I think once I see that detail, I can go ahead and try to code what >> you suggested. >> >> >> Similarly (Chris), where would I put the tests / how would they be implemented? >> >> >> Thanks again for the feedback. >> >> All the best, >> Dan. >> >> >> >> 2009/4/27 Heikki Lehvaslaiho : >>> Dan, >>> >>> It looks like your method does two different things: >>> >>> 1. Returns the longest subsequence above the threshold >>> 2. Analyses the the sequence for the number of ranges the current >>> threshold creates. >>> >>> Why not separate these functions? >>> >>> Lets add a method that sets the threshold and stores it internally as >>> $self->_threshold. Setting it to a new values should trigger emptying >>> all the caches (see below.) >>> >>> Lets have two more public methods: >>> >>> 1. get_clean_range() - optional argument 'threshold' >>> >>> It returns the longest clean subseq. >>> >>> 2. count_clean_ranges() -again optional argument 'threshold' >>> >>> This returns the number of ranges detected. >>> >>> Both methods call first the public method threshold if the argument >>> has been given and then an internal method ?_find_clean_ranges(). That >>> method calculates all the ranges and stores them internally ?(as >>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>> that needs to be emptied whenever any of the critical values of the >>> object changes: threshold, quality or seq. Create an internal method >>> $self->_clear_cache, that does that. >>> >>> Now the quality new object does not get created until you call >>> get_clean_range() which accesses the cached values (or creates them if >>> they are not there). >>> >>> This design allows you to have no extra penalty for adding more >>> methods that act on cached values. For example, it might be sensible >>> thing to do ?at some point to look at all the ranges that are longer >>> than some length. Then you could write in your program: >>> >>> >>> $qual->threshold(10); >>> if ($qual->count_clean_ranges = 1) { >>> ?my $newqual = $qual->get_clean_range() >>> ?# do your analysis >>> } elsif ($qual->count_clean_ranges = 0) { >>> ? # do some reporting and logging >>> } else { ?# more than one ranges >>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>> ? # do some more work and possibly select the best one(s) >>> } >>> >>> >>> >>> Yours, >>> >>> ? -Heikki >>> >>> 2009/4/24 Chris Fields : >>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>> possible, tests don't hurt either! >>>> >>>> chris >>>> >>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>> >>>>> Its a bit rough and ready, but it does what I need... >>>>> >>>>> >>>>> >>>>> >>>>> =head2 get_clear_range >>>>> >>>>> Title ? ?: get_clear_range >>>>> >>>>> Title ? ?: subqual >>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>> Function : Get the clear range using the given quality score as a >>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>> >>>>> Returns ?: a new Bio::Seq::Quality object >>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>> >>>>> =cut >>>>> >>>>> sub get_clear_range >>>>> { >>>>> ? my $self = shift; >>>>> ? my $qual = $self->qual; >>>>> ? my $minQual = shift || 13; >>>>> >>>>> ? my (@ranges, $rangeFlag); >>>>> >>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>> ? ? ? ? ? ? ? ?## Log the range >>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? ? ? ?else{ >>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? } >>>>> ? ## Did we exit the last clear range? >>>>> ? if(defined($rangeFlag)){ >>>>> ? ? ? ?my $i = scalar(@$qual); >>>>> ? ? ? ?## Log the range >>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? } >>>>> >>>>> ? unless(@ranges){ >>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>> ? } >>>>> >>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>> >>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>> >>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>> >>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>> ? ? ? ?} >>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>> >>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>> $_->[1]+1), >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>> $_->[1]+1) >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>> ? } >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>> in (apart from all the debugging output that I spit out). >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>>> >>>>> 2009/4/24 Dan Bolser : >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I couldn't find out how to get the 'clear range' from a >>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>> >>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>> I take the time to finish that off. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? ?-Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 11:53:12 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:53:12 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: 2009/4/27 Chris Fields : > This is going within Bio::Seq::Quality, correct? Yes. Does Bio::Seq::Quality > have a method that indicates what format the quality scores are actually in > (sanger/illumina/illumina1.3/phred/foo)? ?The reason I worry about this is > quality scores appear inseparable from their quality format (ranges vary in > length, for instance). No method. > For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell > which quality format it originated from w/o guessing, and similarly could I > accurately convert it to another qual format? ?To me it seems we need > something in Bio::Seq::Quality akin to the alphabet() method used for > sequence data. The text formats encode the quality values in different ways, but they are all stored as integer arrays in the object. Converting between them is relatively easy. You are right: quality_format() or even plain format() is needed. The SeqIO methods creating the objects should be setting it. Warnings for unset format values should be added to appropriate places. -Heikki > chris > > On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > >> I convinced at least myself to the degree that I wrote the >> range_convert() method - with plenty of tests. I mention this now so >> that no-one else need to start thinking through all the edge values. >> :) >> >> I'll contribute it to the code base once there is a consensus of best >> way forward. >> >> ? -Heikki >> >> 2009/4/27 Heikki Lehvaslaiho : >>>> >>>> I have tried to summarise this in a central place: >>>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Torsten, >>> >>> Thanks for putting this together. Very helpful. >>> >>> Do you have a plan of action? ?Let me propose one for BioPerl. It >>> based on following assumptions: >>> >>> 1. There is multitude of different ways of coding quality values out >>> there. >>> 2. Bio::Seq::Quality is agnostic of any quality value range rules >>> 3. The emerging open standard is the Sanger fastq specification >>> 4. Open source programs use the Sanger fastq specs >>> >>> >>> From these it follows that: >>> >>> >>> 1. BioPerl should support Sanger fastq standard >>> >>> 1.1. it already does and there are other SeqIO modules for dealing >>> with other non-fastq formats. >>> >>> 2. BioPerl should offer simple ways of converting between quality range >>> rules >>> >>> 2.1. Have a generic method accessible from Bio::Seq::Quality with >>> preset versions of the method for converting between known variants >>> (Sanger fastq and the two Illumina versions) >>> >>> For example: >>> >>> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) >>> ?throw if $value < $from_lower or $value > $from_upper >>> ?return $newvalue >>> >>> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >>> range_convert_fastq2phred(), ?range_convert_phred2fastq().... >>> >>> (assuming that illumina 1.3 eq phred) >>> >>> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >>> qualities into Sanger fastq on the fly >>> >>> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of >>> quality value range either automatically or be given a keyword >>> parameter indicating the range. >>> >>> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects >>> a quality value out of range. >>> >>> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >>> detects a quality value out of range. >>> >>> 2.2.4. It would be useful but not absolutely necessary for >>> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >>> ranges >>> >>> >>> What do you think? >>> >>> ? -Heikki >>> >>> 2009/4/26 Torsten Seemann : >>>>>> >>>>>> This might be a good place to ask the question: having looked at the >>>>>> fastq.pm page, is the fastq format defined (only) by a "@'" followed >>>>>> by >>>>> >>>>> a >>>>>> >>>>>> sequence line and a "+" header followed by a quality line and the two >>>>>> headers have to agree? Now that Illumina is using phred scaling, are >>>>>> 'Sanger' and 'Illumina' versions the same? >>>>> >>>>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>>>> and Sanger as value + 33. >>>>> >>>> >>>> Illumina have now CHANGED how they calculate the quality value however >>>> in >>>> the last month or so... Their Q range used to be -5..40 mapped to ASCII >>>> 64+, >>>> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >>>> now... >>>> >>>> I have tried to summarise this in a central place: >>>> >>>> http://en.wikipedia.org/wiki/FASTQ_format >>>> >>>> Corrections welcome! >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> >> >> >> -- >> ? -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 12:11:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 11:11:12 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: On Apr 27, 2009, at 10:53 AM, Heikki Lehvaslaiho wrote: > 2009/4/27 Chris Fields : >> This is going within Bio::Seq::Quality, correct? > > Yes. > > Does Bio::Seq::Quality >> have a method that indicates what format the quality scores are >> actually in >> (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about >> this is >> quality scores appear inseparable from their quality format (ranges >> vary in >> length, for instance). > > No method. > >> For instance, if I picked a Bio::Seq::Quality out of the blue, >> could I tell >> which quality format it originated from w/o guessing, and similarly >> could I >> accurately convert it to another qual format? To me it seems we need >> something in Bio::Seq::Quality akin to the alphabet() method used for >> sequence data. > > The text formats encode the quality values in different ways, but they > are all stored as integer arrays in the object. Converting between > them is relatively easy. > > You are right: quality_format() or even plain format() is needed. The > SeqIO methods creating the objects should be setting it. Warnings for > unset format values should be added to appropriate places. > > -Heikki Agreed, and any conversion methods could default to using a set quality_format()/format() for conversions to/from ascii (might serve as a good verification point as well). chris From maj at fortinbras.us Mon Apr 27 11:51:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Apr 2009 11:51:39 -0400 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com><2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com><90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan - congrats on your first contribution! Mark ----- Original Message ----- From: "Dan Bolser" To: "Heikki Lehvaslaiho" Cc: "Chris Fields" ; Sent: Monday, April 27, 2009 4:31 AM Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method _find_clean_ranges(). That > method calculates all the ranges and stores them internally (as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > my $newqual = $qual->get_clean_range() > # do your analysis > } elsif ($qual->count_clean_ranges = 0) { > # do some reporting and logging > } else { # more than one ranges > my @quals = $qual->get_all_clean_ranges($min_lenght); > # do some more work and possibly select the best one(s) > } > > > > Yours, > > -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title : get_clear_range >>> >>> Title : subqual >>> Usage : $subobj = $obj->get_clear_range(); >>> $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> cutoff or a default value of 13. >>> >>> Returns : a new Bio::Seq::Quality object >>> Args : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> my $self = shift; >>> my $qual = $self->qual; >>> my $minQual = shift || 13; >>> >>> my (@ranges, $rangeFlag); >>> >>> for(my $i=0; $i<@$qual; $i++){ >>> ## Are we currently within a clear range or not? >>> if(defined($rangeFlag)){ >>> ## Did we just leave the clear range? >>> if($qual->[$i]<$minQual){ >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ## and reset the range flag. >>> $rangeFlag = undef; >>> } >>> ## else nothing changes >>> } >>> else{ >>> ## Did we just enter a clear range? >>> if($qual->[$i]>=$minQual){ >>> ## Better set the range flag! >>> $rangeFlag = $i; >>> } >>> ## else nothing changes >>> } >>> } >>> ## Did we exit the last clear range? >>> if(defined($rangeFlag)){ >>> my $i = scalar(@$qual); >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> } >>> >>> unless(@ranges){ >>> die "There is no clear range... I don't know what to do here!\n"; >>> } >>> >>> print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> my $sum; map {$sum += $_->[2]} @ranges; >>> >>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>> "quality scores above the given threshold\n"; >>> >>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> if($_->[2]/$sum < 0.5){ >>> warn "not so much a clear range as a clear chunk...\n"; >>> } >>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>> $_->[1]+1), >>> -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ); >>> } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kaboroev at sfu.ca Mon Apr 27 15:04:05 2009 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Mon, 27 Apr 2009 12:04:05 -0700 Subject: [Bioperl-l] Bio::Graphics Sub Feature Title Message-ID: <49F601A5.8090205@sfu.ca> Hi, I was wondering if it is possible to set a different "-title" for each of the subfeatures in a track the same way one can set a different "-bgcolor" using a subroutine. I noticed that the -title subroutine is only called once per Feature and is passed a "Bio::SeqFeature::Generic" class whereas the -bgcolor subroutine is called once per Sub Feature and is passed the "Bio::SeqFeature::Generic"s which I created. Is there any way for the -title subroutine to be called each Sub Feature or is this not implemented? Keith From dan.bolser at gmail.com Tue Apr 28 01:46:05 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 28 Apr 2009 06:46:05 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0904272246q56e19a2dr542b29f2378d0a48@mail.gmail.com> 2009/4/27 Mark A. Jensen : > Dan - congrats on your first contribution! Mark I don't really feel like I can take much credit! Thanks Heikki! I'll look at what you did and see what I can add. Its a really good feeling to contribute to BioPerl (even if I didn't really do much!)... Now... where do I collect my cheque? ;-) Seriously though, thanks all for helping to put this together, and thanks for maintaining BioPerl and keeping it relevant as the field changes. All the best, Dan. > ----- Original Message ----- From: "Dan Bolser" > To: "Heikki Lehvaslaiho" > Cc: "Chris Fields" ; > Sent: Monday, April 27, 2009 4:31 AM > Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? > > > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be > implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method _find_clean_ranges(). That >> method calculates all the ranges and stores them internally (as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> my $newqual = $qual->get_clean_range() >> # do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> # do some reporting and logging >> } else { # more than one ranges >> my @quals = $qual->get_all_clean_ranges($min_lenght); >> # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> -Heikki >> >> 2009/4/24 Chris Fields : >>> >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title : get_clear_range >>>> >>>> Title : subqual >>>> Usage : $subobj = $obj->get_clear_range(); >>>> $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> cutoff or a default value of 13. >>>> >>>> Returns : a new Bio::Seq::Quality object >>>> Args : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> my $self = shift; >>>> my $qual = $self->qual; >>>> my $minQual = shift || 13; >>>> >>>> my (@ranges, $rangeFlag); >>>> >>>> for(my $i=0; $i<@$qual; $i++){ >>>> ## Are we currently within a clear range or not? >>>> if(defined($rangeFlag)){ >>>> ## Did we just leave the clear range? >>>> if($qual->[$i]<$minQual){ >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ## and reset the range flag. >>>> $rangeFlag = undef; >>>> } >>>> ## else nothing changes >>>> } >>>> else{ >>>> ## Did we just enter a clear range? >>>> if($qual->[$i]>=$minQual){ >>>> ## Better set the range flag! >>>> $rangeFlag = $i; >>>> } >>>> ## else nothing changes >>>> } >>>> } >>>> ## Did we exit the last clear range? >>>> if(defined($rangeFlag)){ >>>> my $i = scalar(@$qual); >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> } >>>> >>>> unless(@ranges){ >>>> die "There is no clear range... I don't know what to do here!\n"; >>>> } >>>> >>>> print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> "quality scores above the given threshold\n"; >>>> >>>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> if($_->[2]/$sum < 0.5){ >>>> warn "not so much a clear range as a clear chunk...\n"; >>>> } >>>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>>> $_->[1]+1), >>>> -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ); >>>> } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From brianli.cas at gmail.com Tue Apr 28 23:14:23 2009 From: brianli.cas at gmail.com (brian li) Date: Wed, 29 Apr 2009 11:14:23 +0800 Subject: [Bioperl-l] Parse problem of a big EMBL entry Message-ID: Hi everyone, Here is greeting from Brian. I have just began to use bioperl 1.6.0 to collect certain data lines from EMBL files. There's a problem when I try to get an entry that includes over 1 million lines. A call of Bio::SeqIO::embl->next_seq would just cause the parser script to exit. I have read Bio/SeqIO/embl.pm and I think one possible way to solve the problem may be to give my script more memory to store the entry data. The machine I am using has 32GB memory, and that shall be enough for any entry. So I am wondering whether there is any way to set the size of the memory available to a perl script. Others ways to deal with the problem are also welcome. Appreciate your help. Brian From jason at bioperl.org Wed Apr 29 01:10:27 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 28 Apr 2009 22:10:27 -0700 Subject: [Bioperl-l] Parse problem of a big EMBL entry In-Reply-To: References: Message-ID: <2154C145-1A66-4EEB-B99E-FBE8215539F5@bioperl.org> Brian - Without memory leaks it should only take up as much memory as the current sequence you have parsed. If you mean you have a sequence record with > 1M lines I'm not sure how much memory that would take up, depends on if this is lots of feature or what. There are ways to tell BioPerl to throw away things you don't want to parse out from the record. See http://bioperl.org/wiki/HOWTO:SeqIO#Speed. 2C_Bio::Seq::SeqBuilder Perl will use as much memory as is available on your machine. Have you monitored the memory use of the perl running to insure it is reaching the 32Gb limit and that is in fact what is killing the program? -jason On Apr 28, 2009, at 8:14 PM, brian li wrote: > Hi everyone, > > Here is greeting from Brian. > > I have just began to use bioperl 1.6.0 to collect certain data > lines from EMBL files. > > There's a problem when I try to get an entry that includes over 1 > million lines. A call of Bio::SeqIO::embl->next_seq would just cause > the parser script to exit. I have read Bio/SeqIO/embl.pm and I think > one possible way to solve the problem may be to give my script more > memory to store the entry data. The machine I am using has 32GB > memory, and that shall be enough for any entry. > > So I am wondering whether there is any way to set the size of the > memory available to a perl script. Others ways to deal with the > problem are also welcome. > > Appreciate your help. > > Brian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From paola.bisignano at gmail.com Wed Apr 29 10:08:57 2009 From: paola.bisignano at gmail.com (Paola Bisignano) Date: Wed, 29 Apr 2009 16:08:57 +0200 Subject: [Bioperl-l] parsing /www.ebi.ac.uk/pdbsum/ Message-ID: Hi, thanks for accepting me in the mailing list, I'm Paola and I work in the institute of cancer in Genoa, Italy, as a bioinformatic...I'm biologist, quite new in perl...(2 months) and never used bioperl...because I prefer learning a little perl before, but now parsing, parsing, and parsing bioinformatic web sites....I need Bioperl :-) I visited www.bioperl.org and read tutorials, I read about a lot of moduls used to parse different web site. I need to parse one in particular EMBL-EBI http://www.ebi.ac.uk/pdbsum/ that is different from EMBL because there are also other information protein-ligand interaction....I never used bioperl moduls...and parsed by myself...but If the receptor has more ligands...it is more difficult to parse...to choose which ligands I need because there are "false" ligands as ions or glycerol that I don't need but I don't know the synthax of this source...for everything can be seen as a ligand....so I want to know if there are moduls that I can use to do my analysis...if anyone can help me...is very wellcome... Thanks From jason at bioperl.org Wed Apr 29 12:41:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 29 Apr 2009 09:41:02 -0700 Subject: [Bioperl-l] Fwd: Parse problem of a big EMBL entry References: Message-ID: Brian - please always CC the mailing list on replies. Not sure what is causing the seg fault so I can't really help here - if you want to file it as a bug at the bugzilla with instructions on how to reproduce it will hopefully get looked at. -jason Begin forwarded message: > From: brian li > Date: April 29, 2009 1:23:32 AM PDT > To: Jason Stajich > Subject: Re: [Bioperl-l] Parse problem of a big EMBL entry > > Hi Jason, > >> Without memory leaks it should only take up as much memory as the >> current >> sequence you have parsed. If you mean you have a sequence record >> with > 1M >> lines I'm not sure how much memory that would take up, depends on >> if this is >> lots of feature or what. > > Lots of feature. > >> There are ways to tell BioPerl to throw away >> things you don't want to parse out from the record. See >> http://bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > Thanks. I think this would help. > >> Perl will use as much memory as is available on your machine. Have >> you >> monitored the memory use of the perl running to insure it is >> reaching the >> 32Gb limit and that is in fact what is killing the program? > > I monitored the memory usage in my last run. The size of free > memory didn't change a lot, and remained to be around 20GB (buffer > size added). I took the wrong assumption. Thanks again for your hint. > > BTW: The message I get when I parse big million-line entry is > "Segmentation fault". Not familiar with this and trying to get a clue. > > Brian Jason Stajich jason at bioperl.org From razi.khaja at gmail.com Wed Apr 29 15:08:14 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 15:08:14 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> Message-ID: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Hello, I am generating BLAST alignments using the BLAST URL API from NCBI. I want to parse details from BLAST reports whenever there are "Features in/flanking this part of subject sequence".? A portion of the BLAST report showing "Features flanking ..." is pasted below. I am using Bio::SearchIO to parse details.? The relevant part of the script is below. The problem I am having is that for some reason the first occurrence of a "Feature flanking this part of a subject sequence" is skipped. I am only able to parse/print all occurrences of a "Feature in/flanking this part of a subject sequence" from the second occurrence to the last occurrence. I believe the code responsible for parsing this information is in Bio/SearchIO/blast.pm, starting on line 760. I have tried fixing the code in Bio/SearchIO/blast.pm myself but was not able to correct the problem. Would it be possible for someone to fix the code in the Bio/SearchIO/blast.pm module, or help me fix the code so that the first occurrence is not skipped? Thanks, Razi ===== The part of the script that is relevant to parsing "Features in/flanking..." ==== my $bio_searchio_in = Bio::SearchIO->new( ??? -file?? => 'blast_result.txt', ??? -format => 'blast' ); my $i = 1; while( my $result = $bio_searchio_in->next_result() ){ ??? while( my $hit = $result->next_hit() ){ ??????? while( my $hsp = $hit->next_hsp() ){ ??????????? my $hsp_features = $hsp->hit_features(); ??????????? if( $hsp_features ) { ??????????????? print "HSP FEATURE $i\t$hsp_features\n"; ??????????????? $i++; ??????????? } ??????? } ??? } } ===== A portion of a BLAST report with "Features flanking ..." ===== ... ... ?Score = 54.7 bits (29),? Expect = 0.003 ?Identities = 29/29 (100%), Gaps = 0/29 (0%) ?Strand=Plus/Minus Query? 6556???? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6584 ??????????????? ||||||||||||||||||||||||||||| Sbjct? 6953042? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6953014 >gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic contig Length=237250 ?Features flanking this part of subject sequence: ?? 16338 bp at 5' side: PRAME family member 8 ?? 11926 bp at 3' side: PRAME family member 9 ?Score = 7286 bits (3945),? Expect = 0.0 ?Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) ?Strand=Plus/Plus Query? 23225? GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG 23284 ????????????? |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| Sbjct? 86128? GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG 86187 Query? 23285? GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA 23344 ????????????? ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| Sbjct? 86188? GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA 86247 ... ... From cjfields at illinois.edu Wed Apr 29 15:41:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Apr 2009 14:41:54 -0500 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Message-ID: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> I'm assuming this is from an older bioperl; this data should be accessible via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl 1.6.0 in CPAN). chris On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > Hello, > > I am generating BLAST alignments using the BLAST URL API from NCBI. > > I want to parse details from BLAST reports whenever there are > "Features in/flanking this part of subject sequence". A portion of > the BLAST report showing "Features flanking ..." is pasted below. > > I am using Bio::SearchIO to parse details. The relevant part of the > script is below. > > The problem I am having is that for some reason the first occurrence > of a "Feature flanking this part of a subject sequence" is skipped. > I am only able to parse/print all occurrences of a "Feature > in/flanking this part of a subject sequence" from the second > occurrence to the last occurrence. > > I believe the code responsible for parsing this information is in > Bio/SearchIO/blast.pm, starting on line 760. > I have tried fixing the code in Bio/SearchIO/blast.pm myself but was > not able to correct the problem. > Would it be possible for someone to fix the code in the > Bio/SearchIO/blast.pm module, or help me fix the code so that the > first occurrence is not skipped? > > Thanks, > Razi > ===== The part of the script that is relevant to parsing "Features > in/flanking..." ==== > my $bio_searchio_in = Bio::SearchIO->new( > -file => 'blast_result.txt', > -format => 'blast' > ); > > my $i = 1; > while( my $result = $bio_searchio_in->next_result() ){ > while( my $hit = $result->next_hit() ){ > while( my $hsp = $hit->next_hsp() ){ > my $hsp_features = $hsp->hit_features(); > if( $hsp_features ) { > print "HSP FEATURE $i\t$hsp_features\n"; > $i++; > } > } > } > } > > ===== A portion of a BLAST report with "Features flanking ..." ===== > ... > ... > Score = 54.7 bits (29), Expect = 0.003 > Identities = 29/29 (100%), Gaps = 0/29 (0%) > Strand=Plus/Minus > > Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 > ||||||||||||||||||||||||||||| > Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 > > >> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 >> genomic contig > Length=237250 > > Features flanking this part of subject sequence: > 16338 bp at 5' side: PRAME family member 8 > 11926 bp at 3' side: PRAME family member 9 > > Score = 7286 bits (3945), Expect = 0.0 > Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) > Strand=Plus/Plus > > Query 23225 > GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG > 23284 > |||||||||||||||||||||||||||||||| |||||| ||||||||||| > |||||||| > Sbjct 86128 > GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG > 86187 > > Query 23285 > GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA > 23344 > ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| > ||||| > Sbjct 86188 > GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA > 86247 > ... > ... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjm at berkeleybop.org Wed Apr 29 16:58:15 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Wed, 29 Apr 2009 13:58:15 -0700 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> The .ontology files have been deprecated by GO. Use the .obo files instead. It appears the bioperl parser for the .ontology files isn't able to deal with the new relations in GO. I suggest that the bioperl .ontology parser is deprecated too On Apr 22, 2009, at 6:38 AM, Hilmar Lapp wrote: > Hi Carlos, > > I am moving your inquiry to the BioPerl list, as the tool is a part > of Bioperl-db and uses BioPerl for parsing the ontologies. > > In your case, the goflat parser in BioPerl seems to balk at the > second one of the input files. It may be that the input file is > (was?) corrupted, that does happen every once in a while. More > likely though is that the goflat parser hasn't kept up with some > format changes. Have you tried using the obo format version instead? > > -hilmar > > On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > >> Hi guys >> >> I'm working with biosql and I try to figure out how to load >> ontologies into biosql. >> >> I've tried >> >> load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- >> host localhost --dbname biosql --namespace "Gene Ontology" --format >> goflat --fmtargs "-defs_file,GO.defs" function.ontology >> process.ontology component.ontology >> >> as in the script info but I have an error, >> >> >> ------------------- WARNING --------------------- >> MSG: DBLink exists in the dblink of _default >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: format error (file process.ontology) offending line: >> -negative regulation of angiogenesis ; GO:0016525 ; synonym:down >> regulation of angiogenesis ; synonym:down\-regulation of >> angiogenesis ; synonym:downregulation of angiogenesis ; >> synonym:inhibition of angiogenesis % negative regulation of >> developmental process ; GO:0051093 % regulation of angiogenesis ; >> GO:0045765 >> >> STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ >> perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 >> STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ >> Bio/OntologyIO/dagflat.pm:284 >> STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ >> 5.10.0/Bio/OntologyIO/dagflat.pm:317 >> STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ >> load_ontology.pl:604 >> ------------------------------------- >> >> Any suggestion? >> >> Cheers, >> >> Carlos >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Apr 29 19:48:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 29 Apr 2009 19:48:10 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com><62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <7A9746282BA343F78423D12DB1578509@NewLife> also check out http://www.bioperl.org/wiki/Parsing_BLAST_HSPs MAJ ----- Original Message ----- From: "Chris Fields" To: "Razi Khaja" Cc: Sent: Wednesday, April 29, 2009 3:41 PM Subject: Re: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl > 1.6.0 in CPAN). > > chris > > On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > >> Hello, >> >> I am generating BLAST alignments using the BLAST URL API from NCBI. >> >> I want to parse details from BLAST reports whenever there are >> "Features in/flanking this part of subject sequence". A portion of >> the BLAST report showing "Features flanking ..." is pasted below. >> >> I am using Bio::SearchIO to parse details. The relevant part of the >> script is below. >> >> The problem I am having is that for some reason the first occurrence >> of a "Feature flanking this part of a subject sequence" is skipped. >> I am only able to parse/print all occurrences of a "Feature >> in/flanking this part of a subject sequence" from the second >> occurrence to the last occurrence. >> >> I believe the code responsible for parsing this information is in >> Bio/SearchIO/blast.pm, starting on line 760. >> I have tried fixing the code in Bio/SearchIO/blast.pm myself but was >> not able to correct the problem. >> Would it be possible for someone to fix the code in the >> Bio/SearchIO/blast.pm module, or help me fix the code so that the >> first occurrence is not skipped? >> >> Thanks, >> Razi > > > >> ===== The part of the script that is relevant to parsing "Features >> in/flanking..." ==== >> my $bio_searchio_in = Bio::SearchIO->new( >> -file => 'blast_result.txt', >> -format => 'blast' >> ); >> >> my $i = 1; >> while( my $result = $bio_searchio_in->next_result() ){ >> while( my $hit = $result->next_hit() ){ >> while( my $hsp = $hit->next_hsp() ){ >> my $hsp_features = $hsp->hit_features(); >> if( $hsp_features ) { >> print "HSP FEATURE $i\t$hsp_features\n"; >> $i++; >> } >> } >> } >> } >> >> ===== A portion of a BLAST report with "Features flanking ..." ===== >> ... >> ... >> Score = 54.7 bits (29), Expect = 0.003 >> Identities = 29/29 (100%), Gaps = 0/29 (0%) >> Strand=Plus/Minus >> >> Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 >> ||||||||||||||||||||||||||||| >> Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 >> >> >>> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic >>> contig >> Length=237250 >> >> Features flanking this part of subject sequence: >> 16338 bp at 5' side: PRAME family member 8 >> 11926 bp at 3' side: PRAME family member 9 >> >> Score = 7286 bits (3945), Expect = 0.0 >> Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) >> Strand=Plus/Plus >> >> Query 23225 GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG >> 23284 >> |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| >> Sbjct 86128 GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG >> 86187 >> >> Query 23285 GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA >> 23344 >> ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| >> Sbjct 86188 GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA >> 86247 >> ... >> ... >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Apr 29 20:31:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Apr 2009 12:31:06 +1200 Subject: [Bioperl-l] waaaay off topic question In-Reply-To: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> References: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C84151@exchsth.agresearch.co.nz> I have a question that's nothing to do with BioPerl or Perl, but hope there's a chance that some of you clever people may be doing the same thing as me :-) I've been asked to write some VB scripts to control Applied Biosystems "Analyst QS" and "BioAnalyst" applications for analyzing mass-spec data. There's limited documentation (10yr out of date) with some example code (that doesn't compile) so I'm not getting as far along as I'd like. Has anyone worked with this stuff before? Any assistance greatly appreciated !!! Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From razi.khaja at gmail.com Wed Apr 29 23:57:17 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 23:57:17 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <62e9dabc0904292057y6b725e0yc3b0a85c661c44f8@mail.gmail.com> Hello Chris, I am using bioperl 1.6.0. It may be a few weeks before I can upgrade to bioperl-live from svn, and so it may be a few weeks before I can return to my question. When I do upgrade, I will report back to this thread if I still encounter problems. Razi On Wed, Apr 29, 2009 at 3:41 PM, Chris Fields wrote: > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in > bioperl 1.6.0 in CPAN). > > chris > > > From jonathanmflowers at gmail.com Thu Apr 30 12:40:42 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:40:42 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem Message-ID: <23319982.post@talk.nabble.com> Dear colleagues, I have set up a mySQL database and loaded a GFF3 and fasta file using Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be working normally except when I attempt to create a Bio::DB::SeqFeature::Segment object. The following works as expected: my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:foo', -user => 'myuser', -pass => 'mypassword', -write => '1'); my @features = $db->features(-seq_id=>'chr1', -start=>1, -end=>10000, -types=>['gene']); However, when I try to create a segment object using either of the two following method calls I get an error: my $segment = $db->segment('chr1',1=>10000); my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => '10000'); -------------------------------- EXCEPTION ------------------------------------ MSG: segment() called in a scalar context but multiple features match. Either call in a list context or narrow your search using the -types or -class arguments STACK Bio::DB::SeqFeature::Store::segment /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 STACK toplevel trial.pl:42 ------------------------------------------------------- Calling in list context (which is not defined in the documentation) produces an array of 22 identical scalars = 'chr1:1..10000'. Any ideas? Thanks Jonathan -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Apr 30 12:52:24 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:52:24 -0700 (PDT) Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23264714.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> Message-ID: <23320232.post@talk.nabble.com> Hi, There is no means to do this in bioperl, but it is simple to make a system call and execute an MSA program such as MUSCLE to align fasta-formatted sequences using something like... qx(muscle -in $infilename -out $outfilename) Jonathan laxmanb wrote: > > I need to create a multiple sequence alignment of some sequences using > CLUSTALW or any other Multiple sequence alignment program. However, I've > learnt that this functionality used to be UNIX/Linux only. However, the > documentation is also very old, so I'd like to know if any CLUSTAL/ any > other MSA programs can be run using BioPerl on Windows. > > Thank you for your time :) > -- View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Thu Apr 30 13:04:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Apr 2009 12:04:46 -0500 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <92920FDD-7CB2-4331-9860-87304E16C948@illinois.edu> I don't recall this being a UNIX-only issue, though admittedly it's been years since I've tried running the bioperl-run modules on WinXP. I do recall getting BLAST, EMBOSS and others to work though; I don't see why ClustalW would be much different. Have you actually tested this out and found a problem? Have you tried cygwin? chris On Apr 30, 2009, at 11:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Apr 30 13:29:29 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:29:29 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <23319982.post@talk.nabble.com> References: <23319982.post@talk.nabble.com> Message-ID: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> One would have to see some of your GFF to know better. It sounds like you have chr1 defined in multiple places. Did you use the bp_seqfeature_load script to load the data in one go - it should catch it if you have non-unique IDs. -jason On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > > Dear colleagues, > > I have set up a mySQL database and loaded a GFF3 and fasta file using > Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be > working > normally except when I attempt to create a > Bio::DB::SeqFeature::Segment > object. > > The following works as expected: > > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:foo', > -user => 'myuser', > -pass => 'mypassword', > -write => '1'); > > my @features = $db->features(-seq_id=>'chr1', > -start=>1, > -end=>10000, > -types=>['gene']); > > However, when I try to create a segment object using either of the two > following method calls I get an error: > > my $segment = $db->segment('chr1',1=>10000); > > my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => > '10000'); > > -------------------------------- EXCEPTION > ------------------------------------ > > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types > or > -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 > STACK toplevel trial.pl:42 > ------------------------------------------------------- > > Calling in list context (which is not defined in the documentation) > produces > an array of 22 identical scalars = 'chr1:1..10000'. > > Any ideas? > > Thanks > > Jonathan > > -- > View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Apr 30 13:31:19 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:31:19 -0700 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <734F5ADF-77F5-4AA5-A676-79B42B3C54CB@bioperl.org> the bioperl-run module of Bio::Tools::Run::Alignment::Clustalw or MUSCLE ones don't work then? They do the cmdline work for you. On Apr 30, 2009, at 9:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 15:27:15 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 12:27:15 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion Message-ID: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> So, I'm parsing Genbank sequences to pull out the various exons. I found the way to get the NCBI Exon number from each feature, but am confused about one of the methods. When I do annotation->as_text I'm expecting to get back 1 from the feature, but instead get back Value: 1 ??!? Why is the value from the NCBI file getting that text tagged onto it? http://www.ncbi.nlm.nih.gov/nuccore/73622129 exon 1..774 /gene="BOLA2" /gene_synonym="BOLA2A; My016" /inference="alignment:Splign" /number=1 print ($f->annotation->get_Annotations('number'))[0]->as_text; Value: 1 From SMarkel at accelrys.com Thu Apr 30 15:56:40 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 30 Apr 2009 15:56:40 -0400 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Kevin, I believe the extra text was added for readability when printing to the console. In our code we just add the following post- processing step. (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 30 April 2009 12:27 PM > To: BioPerl List > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > So, I'm parsing Genbank sequences to pull out the various exons. I found > the way to get the NCBI Exon number from each feature, but am confused > about one of the methods. When I do annotation->as_text I'm expecting to > get back 1 from the feature, but instead get back Value: 1 ??!? Why is > the value from the NCBI file getting that text tagged onto it? > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > exon 1..774 > /gene="BOLA2" > /gene_synonym="BOLA2A; My016" > /inference="alignment:Splign" > /number=1 > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > Value: 1 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Thu Apr 30 16:01:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:01:03 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Message-ID: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> That's nice in some regards, but makes it hard to use the function in code without having to always process the result, which seems to be counter to what one would expect. E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". Is there a better way to get the number directly without having to strip off the text that never existed in the first place? > -----Original Message----- > From: Scott Markel [mailto:SMarkel at accelrys.com] > Sent: Thursday, April 30, 2009 12:57 PM > To: Kevin Brown; BioPerl List > Subject: RE: Bio::Annotations::Collection confusion > > Kevin, > > I believe the extra text was added for readability when printing > to the console. In our code we just add the following post- > processing step. > > (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > Sent: Thursday, 30 April 2009 12:27 PM > > To: BioPerl List > > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > > > So, I'm parsing Genbank sequences to pull out the various > exons. I found > > the way to get the NCBI Exon number from each feature, but > am confused > > about one of the methods. When I do annotation->as_text I'm > expecting to > > get back 1 from the feature, but instead get back Value: 1 > ??!? Why is > > the value from the NCBI file getting that text tagged onto it? > > > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > exon 1..774 > > /gene="BOLA2" > > /gene_synonym="BOLA2A; My016" > > /inference="alignment:Splign" > > /number=1 > > > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > > Value: 1 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jonathanmflowers at gmail.com Thu Apr 30 16:22:23 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 13:22:23 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> References: <23319982.post@talk.nabble.com> <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> Message-ID: <23322607.post@talk.nabble.com> Jason, I used the Bio::DB::SeqFeature::Store::GFF3Loader rather than the bp_seqfeature_load.pl script. You were right, however. It looks like I had populated the MySQL database with multiple fasta files. I cleared the database, ran the GFF3Loader twice (once for the fasta, once for the GFF3). Segment objects are appear to be working fine now. THANKS! Jonathan Jason Stajich-3 wrote: > > One would have to see some of your GFF to know better. It sounds like > you have chr1 defined in multiple places. > > Did you use the bp_seqfeature_load script to load the data in one go - > it should catch it if you have non-unique IDs. > > -jason > On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > >> >> Dear colleagues, >> >> I have set up a mySQL database and loaded a GFF3 and fasta file using >> Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be >> working >> normally except when I attempt to create a >> Bio::DB::SeqFeature::Segment >> object. >> >> The following works as expected: >> >> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >> -dsn => 'dbi:mysql:foo', >> -user => 'myuser', >> -pass => 'mypassword', >> -write => '1'); >> >> my @features = $db->features(-seq_id=>'chr1', >> -start=>1, >> -end=>10000, >> -types=>['gene']); >> >> However, when I try to create a segment object using either of the two >> following method calls I get an error: >> >> my $segment = $db->segment('chr1',1=>10000); >> >> my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => >> '10000'); >> >> -------------------------------- EXCEPTION >> ------------------------------------ >> >> MSG: segment() called in a scalar context but multiple features match. >> Either call in a list context or narrow your search using the -types >> or >> -class arguments >> >> STACK Bio::DB::SeqFeature::Store::segment >> /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 >> STACK toplevel trial.pl:42 >> ------------------------------------------------------- >> >> Calling in list context (which is not defined in the documentation) >> produces >> an array of 22 identical scalars = 'chr1:1..10000'. >> >> Any ideas? >> >> Thanks >> >> Jonathan >> >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23322607.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Thu Apr 30 16:24:25 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 13:24:25 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> Message-ID: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Seems like you just want $annotation->value ? =head2 as_text Title : as_text Usage : my $text = $obj->as_text Function: return the string "Value: $v" where $v is the value Returns : string Args : none =cut =head2 display_text Title : display_text Usage : my $str = $ann->display_text(); Function: returns a string. Unlike as_text(), this method returns a string formatted as would be expected for te specific implementation. One can pass a callback as an argument which allows custom text generation; the callback is passed the current instance and any text returned Example : Returns : a string Args : [optional] callback =cut =head2 value Title : value Usage : $obj->value($newval) Function: Get/Set the value for simplevalue Returns : value of value Args : newvalue (optional) =cut On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > That's nice in some regards, but makes it hard to use the function in > code without having to always process the result, which seems to be > counter to what one would expect. > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > Is there a better way to get the number directly without having to > strip > off the text that never existed in the first place? > >> -----Original Message----- >> From: Scott Markel [mailto:SMarkel at accelrys.com] >> Sent: Thursday, April 30, 2009 12:57 PM >> To: Kevin Brown; BioPerl List >> Subject: RE: Bio::Annotations::Collection confusion >> >> Kevin, >> >> I believe the extra text was added for readability when printing >> to the console. In our code we just add the following post- >> processing step. >> >> (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Co-chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >>> Sent: Thursday, 30 April 2009 12:27 PM >>> To: BioPerl List >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion >>> >>> So, I'm parsing Genbank sequences to pull out the various >> exons. I found >>> the way to get the NCBI Exon number from each feature, but >> am confused >>> about one of the methods. When I do annotation->as_text I'm >> expecting to >>> get back 1 from the feature, but instead get back Value: 1 >> ??!? Why is >>> the value from the NCBI file getting that text tagged onto it? >>> >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 >>> exon 1..774 >>> /gene="BOLA2" >>> /gene_synonym="BOLA2A; My016" >>> /inference="alignment:Splign" >>> /number=1 >>> >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; >>> Value: 1 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 16:45:29 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:45:29 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> OK. Can't see that method in the Deobfuscator which might explain why I didn't know about it. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A %3AAnnotation%3A%3ACollection > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > Behalf Of Jason Stajich > Sent: Thursday, April 30, 2009 1:24 PM > To: Kevin Brown > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > Seems like you just want $annotation->value ? > > > =head2 as_text > > Title : as_text > Usage : my $text = $obj->as_text > Function: return the string "Value: $v" where $v is the value > Returns : string > Args : none > > > =cut > > =head2 display_text > > Title : display_text > Usage : my $str = $ann->display_text(); > Function: returns a string. Unlike as_text(), this method > returns a > string > formatted as would be expected for te specific > implementation. > > One can pass a callback as an argument which > allows custom > text > generation; the callback is passed the current instance > and any text > returned > Example : > Returns : a string > Args : [optional] callback > > =cut > > =head2 value > > Title : value > Usage : $obj->value($newval) > Function: Get/Set the value for simplevalue > Returns : value of value > Args : newvalue (optional) > > > =cut > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > That's nice in some regards, but makes it hard to use the > function in > > code without having to always process the result, which seems to be > > counter to what one would expect. > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > Is there a better way to get the number directly without having to > > strip > > off the text that never existed in the first place? > > > >> -----Original Message----- > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > >> Sent: Thursday, April 30, 2009 12:57 PM > >> To: Kevin Brown; BioPerl List > >> Subject: RE: Bio::Annotations::Collection confusion > >> > >> Kevin, > >> > >> I believe the extra text was added for readability when printing > >> to the console. In our code we just add the following post- > >> processing step. > >> > >> (my $text = $annotation->as_text()) =~ > s/(Comment|Value): //; > >> > >> Scott > >> > >> Scott Markel, Ph.D. > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > >> San Diego, CA 92121 fax: +1 858 799 5222 > >> USA web: http://www.accelrys.com > >> > >> http://www.linkedin.com/in/smarkel > >> Vice President, Board of Directors: > >> International Society for Computational Biology > >> Co-chair: ISCB Publications Committee > >> Associate Editor: PLoS Computational Biology > >> Editorial Board: Briefings in Bioinformatics > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > >>> Sent: Thursday, 30 April 2009 12:27 PM > >>> To: BioPerl List > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > >>> > >>> So, I'm parsing Genbank sequences to pull out the various > >> exons. I found > >>> the way to get the NCBI Exon number from each feature, but > >> am confused > >>> about one of the methods. When I do annotation->as_text I'm > >> expecting to > >>> get back 1 from the feature, but instead get back Value: 1 > >> ??!? Why is > >>> the value from the NCBI file getting that text tagged onto it? > >>> > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > >>> exon 1..774 > >>> /gene="BOLA2" > >>> /gene_synonym="BOLA2A; My016" > >>> /inference="alignment:Splign" > >>> /number=1 > >>> > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > >>> Value: 1 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > From Russell.Smithies at agresearch.co.nz Thu Apr 30 17:28:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 1 May 2009 09:28:39 +1200 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C843A2@exchsth.agresearch.co.nz> It's buried in Bio::Annotation::SimpleValue I think http://bioperl.org/cgi-bin/deob_interface.cgi?Search=&module=&sort_order=by+method&search_string=Bio%3A%3AAnnotation%3A%3ASimpleValue&Filter=Submit+Query > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Friday, 1 May 2009 8:45 a.m. > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > OK. Can't see that method in the Deobfuscator which might explain why I > didn't know about it. > > http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 > A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A > %3AAnnotation%3A%3ACollection > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > > Behalf Of Jason Stajich > > Sent: Thursday, April 30, 2009 1:24 PM > > To: Kevin Brown > > Cc: BioPerl List > > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > > > Seems like you just want $annotation->value ? > > > > > > =head2 as_text > > > > Title : as_text > > Usage : my $text = $obj->as_text > > Function: return the string "Value: $v" where $v is the value > > Returns : string > > Args : none > > > > > > =cut > > > > =head2 display_text > > > > Title : display_text > > Usage : my $str = $ann->display_text(); > > Function: returns a string. Unlike as_text(), this method > > returns a > > string > > formatted as would be expected for te specific > > implementation. > > > > One can pass a callback as an argument which > > allows custom > > text > > generation; the callback is passed the current instance > > and any text > > returned > > Example : > > Returns : a string > > Args : [optional] callback > > > > =cut > > > > =head2 value > > > > Title : value > > Usage : $obj->value($newval) > > Function: Get/Set the value for simplevalue > > Returns : value of value > > Args : newvalue (optional) > > > > > > =cut > > > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > > > That's nice in some regards, but makes it hard to use the > > function in > > > code without having to always process the result, which seems to be > > > counter to what one would expect. > > > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > > > Is there a better way to get the number directly without having to > > > strip > > > off the text that never existed in the first place? > > > > > >> -----Original Message----- > > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > > >> Sent: Thursday, April 30, 2009 12:57 PM > > >> To: Kevin Brown; BioPerl List > > >> Subject: RE: Bio::Annotations::Collection confusion > > >> > > >> Kevin, > > >> > > >> I believe the extra text was added for readability when printing > > >> to the console. In our code we just add the following post- > > >> processing step. > > >> > > >> (my $text = $annotation->as_text()) =~ > > s/(Comment|Value): //; > > >> > > >> Scott > > >> > > >> Scott Markel, Ph.D. > > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > > >> San Diego, CA 92121 fax: +1 858 799 5222 > > >> USA web: http://www.accelrys.com > > >> > > >> http://www.linkedin.com/in/smarkel > > >> Vice President, Board of Directors: > > >> International Society for Computational Biology > > >> Co-chair: ISCB Publications Committee > > >> Associate Editor: PLoS Computational Biology > > >> Editorial Board: Briefings in Bioinformatics > > >> > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > >>> Sent: Thursday, 30 April 2009 12:27 PM > > >>> To: BioPerl List > > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > >>> > > >>> So, I'm parsing Genbank sequences to pull out the various > > >> exons. I found > > >>> the way to get the NCBI Exon number from each feature, but > > >> am confused > > >>> about one of the methods. When I do annotation->as_text I'm > > >> expecting to > > >>> get back 1 from the feature, but instead get back Value: 1 > > >> ??!? Why is > > >>> the value from the NCBI file getting that text tagged onto it? > > >>> > > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > >>> exon 1..774 > > >>> /gene="BOLA2" > > >>> /gene_synonym="BOLA2A; My016" > > >>> /inference="alignment:Splign" > > >>> /number=1 > > >>> > > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > > >>> Value: 1 > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason at bioperl.org > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Kevin.M.Brown at asu.edu Thu Apr 30 17:56:16 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 14:56:16 -0700 Subject: [Bioperl-l] Other object oddities Message-ID: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> So, I'm using quite a bit of bioperl code in my own stuff and have been seeing some oddities with the naming of methods. A good example would be in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method called "seq" but in the latter case it returns an object (and expects an object when doing a Set) and in the former it returns a string and expects a string when doing a Set. This makes for a bit of brain freeze on my part when the return from another object might be a Bio::Seq or Bio::SeqFeature::Generic and now calling the ->seq returns different things. Guess I'm just curious if anyone has done an audit of the methods of the various objects and their return types to see how consistent they are across even a subsection of the codebase? From maj at fortinbras.us Wed Apr 1 05:28:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 01:28:24 -0400 Subject: [Bioperl-l] #bioperl bot talk Message-ID: <2589D1BF1EA24C119C06982EB70F490C@NewLife> Hi All, Some cool stuff going on on the IRC node (freenode.net/#bioperl). Andrew Stewart has been prototyping an irc bot with Bioperl functionality built-in. The possibilities for improving support and logging our increasing irc traffic are terrifying. I've set up a wiki page (http://www.bioperl.org/wiki/Bots) under the new IRC category for discussions. Please feel free to contribute use cases, ideas, praise and blame. cheers, Mark From johann.pellet at inserm.fr Wed Apr 1 10:14:25 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 1 Apr 2009 12:14:25 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: Hi all, With the latest version of BioPerl and BioSQL, I have tried to insert entry from a GenBank file, which I have downloaded from the NCBI website (648 937 records) After successfully loading ncbi_taxonomy i am getting following error message while loading sequences into database. perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg -dbname biosql --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Human papillomavirus type 2c' (I was supplied 'Human papillomavirus - 2 | Alphapapillomavirus | Pa pillomaviridae') the script is not stopped until this entry: S67864 --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, values were ("1","19)","1","3") FKs (41914,) ERROR: invalid input syntax for integer: "19)" --------------------------------------------------- Could not store S67864: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::LocationAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ig nored until end of transaction block STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /Library/ Perl/5.8.8/Bio/DB/BioSQL/SeqFeatureAdaptor.pm:291 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:257 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 Any Idea? Thanks in advance Johann From florent.angly at gmail.com Wed Apr 1 17:03:28 2009 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 01 Apr 2009 10:03:28 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D39E60.1020103@gmail.com> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you won't be able to put its information in a hash (unless you have a lot of memory). Florent Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. > The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. > > It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the simplest way. > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> Sent: Wednesday, 1 April 2009 7:43 a.m. >> To: bioperl-l >> Subject: [Bioperl-l] taxonomy ID >> >> Hi All, >> I am writing a script, for one of its part i have to parse a blast >> report (refseq blast) and check how may organisms are eukaryotes and how >> namy of them are prokaryotes. >> I am using BIO::DB::taxinomy module: >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >> >> But for this i need a taxonomyid (like '33090') given in the example. >> So is it possible to get a taxonomyid from refseq balst report? >> If not then how i can deal with this problem? >> >> i would really appreciate if anyone can help me out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Wed Apr 1 17:15:48 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 19:15:48 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> Message-ID: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Hi all, I have a list of PUBMED IDs and I am trying to retrieve automatically the *full article* in any format (not just the abstract). Is there any method in bioperl that allows this? any other solution? Currently I am trying to solve this using WWW::Mechanize, but do you know of any other method to do this? Any help would be appreciated, Thanks in advance, M; From kanzure at gmail.com Wed Apr 1 18:18:22 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Wed, 1 Apr 2009 13:18:22 -0500 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <55ad6af70904011118q7cbdb05u9c89958de3ccc87e@mail.gmail.com> On Wed, Apr 1, 2009 at 12:15 PM, Miguel Pignatelli wrote: > I have a list of PUBMED IDs and I am trying to retrieve automatically the > *full article* in any format (not just the abstract). Is there any method in > bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you know of > any other method to do this? You can try pubget.com- it's a web gateway to download pubmedcentral articles. Unfortunately this means it does not have pubmed articles. What I have found with pubmed is that it's mainly a listing of abstracts, and then the various papers may or may not be online in their respective journals on the web somewhere else, and rarely are there any links to the publisher website. So how are you using WWW::Mechanize in this context? Is there some secret to attaining papers that are listed via pubmed? There's no magical links to the publisher websites .. so what's going on? - Bryan http://heybryan.org/ 1 512 203 0507 From Russell.Smithies at agresearch.co.nz Wed Apr 1 19:33:35 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:33:35 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5615@exchsth.agresearch.co.nz> There's always more than one way to do it. I have no trouble loading it into a hash but you could just grep the file: my(undef,$tax_id) = split("\s", `grep -w -P "^$accession" gi_taxid_prot.dmp`); --Russell > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Thursday, 2 April 2009 6:03 a.m. > To: Smithies, Russell > Cc: 'shalabh sharma'; 'bioperl-l' > Subject: Re: [Bioperl-l] taxonomy ID > > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: > > The taxonomy information isn't in the blast output unless you created custom > fasta headers for your blast database. > > The easiest way to get the tax_id for your accessions would be to download > the gi->tax_id list from > ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > > If you load that file into a hash, parse the accessions out of the blast > hits then lookup the tax_id from that hash, I think it should be fairly fast. > > > > Checking which are prokaryotes and which are eukaryotes based on tax_id is a > separate problem :-) > > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file > contained within lists what division each tax_id belongs to (Bacteria, > Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out > from that. > > > > It's not a very BioPerly solution but sometimes just looking up the answer > from a file/table/hash is the simplest way. > > > > Hope this helps, > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >> Sent: Wednesday, 1 April 2009 7:43 a.m. > >> To: bioperl-l > >> Subject: [Bioperl-l] taxonomy ID > >> > >> Hi All, > >> I am writing a script, for one of its part i have to parse a > blast > >> report (refseq blast) and check how may organisms are eukaryotes and how > >> namy of them are prokaryotes. > >> I am using BIO::DB::taxinomy module: > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> But for this i need a taxonomyid (like '33090') given in the example. > >> So is it possible to get a taxonomyid from refseq balst report? > >> If not then how i can deal with this problem? > >> > >> i would really appreciate if anyone can help me out. > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From Russell.Smithies at agresearch.co.nz Wed Apr 1 19:48:02 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 08:48:02 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Not all articles have full-text at Pubmed but if you know the article ID, you can usually get the whole article (if available) like this: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez or as pdf http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf I'd just build a URL and use wget. If you're searching Pubmed directly, use a query like this to ensure you only get articles with links to full text: cancer AND (free full text[sb]) eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text[sb]) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > Sent: Thursday, 2 April 2009 6:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Hi all, > > I have a list of PUBMED IDs and I am trying to retrieve automatically > the *full article* in any format (not just the abstract). Is there any > method in bioperl that allows this? any other solution? > Currently I am trying to solve this using WWW::Mechanize, but do you > know of any other method to do this? > > Any help would be appreciated, > > Thanks in advance, > > M; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From miguel.pignatelli at uv.es Wed Apr 1 22:14:13 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 2 Apr 2009 00:14:13 +0200 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: Thanks for the response, I have PMIDs extracted from Genbank flat files, is there a way to convert PMIDs to PMCIDs? I found this page: http://www.ncbi.nlm.nih.gov/sites/pmctopmid Is it possible to download the underlying conversion table for local use? Thank you very much in advance, M; El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > Not all articles have full-text at Pubmed but if you know the > article ID, you can usually get the whole article (if available) > like this: > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez > > or as pdf > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > I'd just build a URL and use wget. > > If you're searching Pubmed directly, use a query like this to ensure > you only get articles with links to full text: > > cancer AND (free full text[sb]) > eg http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text > [sb]) > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli >> Sent: Thursday, 2 April 2009 6:16 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles >> >> Hi all, >> >> I have a list of PUBMED IDs and I am trying to retrieve automatically >> the *full article* in any format (not just the abstract). Is there >> any >> method in bioperl that allows this? any other solution? >> Currently I am trying to solve this using WWW::Mechanize, but do you >> know of any other method to do this? >> >> Any help would be appreciated, >> >> Thanks in advance, >> >> M; >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > From Russell.Smithies at agresearch.co.nz Wed Apr 1 22:47:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Apr 2009 11:47:30 +1300 Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles In-Reply-To: References: <223334F4-C6E8-4A25-8EB0-77855C10DC5A@jays.net> <5A11046D-EA9D-467A-A1E8-208E77C94288@uv.es> <18DF7D20DFEC044098A1062202F5FFF324939F5623@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F5761@exchsth.agresearch.co.nz> Try this: http://www.pubmedcentral.nih.gov/about/ftp.html#Obtaining_DOIs Use ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz to associate PMC articles with a PMC ID, a PubMed ID, and the corresponding DOI. PMC-ids.csv.gz is a comma separated file with the following fields: * Journal Title * ISSN * Electronic ISSN * Publication Year * Volume * Issue * Page * DOI (if available) * PMC ID * PubMed ID (if available) * Manuscript ID (if available) * Release Date (Mmm DD YYYY or live) --Russell > -----Original Message----- > From: Miguel Pignatelli [mailto:miguel.pignatelli at uv.es] > Sent: Thursday, 2 April 2009 11:14 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Is it possible to retrieve full pubmed articles > > Thanks for the response, > > I have PMIDs extracted from Genbank flat files, is there a way to > convert PMIDs to PMCIDs? > I found this page: > > http://www.ncbi.nlm.nih.gov/sites/pmctopmid > > Is it possible to download the underlying conversion table for local > use? > > Thank you very much in advance, > > M; > > > El 01/04/2009, a las 21:48, Smithies, Russell escribi?: > > > Not all articles have full-text at Pubmed but if you know the > > article ID, you can usually get the whole article (if available) > > like this: > > > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentr > ez > > > > or as pdf > > http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf > > > > I'd just build a URL and use wget. > > > > If you're searching Pubmed directly, use a query like this to ensure > > you only get articles with links to full text: > > > > cancer AND (free full text[sb]) > > eg > http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free > +full+text > > [sb]) > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Miguel Pignatelli > >> Sent: Thursday, 2 April 2009 6:16 a.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Is it possible to retrieve full pubmed articles > >> > >> Hi all, > >> > >> I have a list of PUBMED IDs and I am trying to retrieve automatically > >> the *full article* in any format (not just the abstract). Is there > >> any > >> method in bioperl that allows this? any other solution? > >> Currently I am trying to solve this using WWW::Mechanize, but do you > >> know of any other method to do this? > >> > >> Any help would be appreciated, > >> > >> Thanks in advance, > >> > >> M; > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > From tristan.lefebure at gmail.com Thu Apr 2 03:11:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 1 Apr 2009 23:11:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq Message-ID: <200904012311.51764.tristan.lefebure@gmail.com> Hi there, I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. Here is the description: Title : uniq_seq Usage : $aln->uniq_seq(): Remove identical sequences in in the alignment. Ambiguous base ("N", "n") and leading and ending gaps ("-") are NOT counted as differences. Function : Make a new alignment of unique sequence types (STs) Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") 2. ST of each sequence in STDERR Argument : None What I'm trying to obtain is the ST composition (i.e. what is supposed to go to STDERR), but I see nothing... An example: --------test.fasta: >seq1 AAATTTC >seq2 CAATTTC >seq3 AAATTTC ------- ----------test.pl: #! /usr/bin/perl use strict; use warnings; use Bio::AlignIO; use Bio::SimpleAlign; use Getopt::Long; my $in = Bio::AlignIO->new(-file => 'test.fasta' , -format => 'fasta'); my $out = Bio::AlignIO->new(-file => ">test.out" , -format => 'fasta'); while ( my $aln = $in->next_aln() ) { my $red_aln = $aln->uniq_seq; $out->write_aln($red_aln); } ------------- If you run: ./test.pl &> log you will get nothing written into the log file... (but the test.out is OK) Am I missing something? By the way, wouldn't it be more convenient to have the ST composition returned in an array? Thanks, --Tristan (BioPerl 1.6) From maj at fortinbras.us Thu Apr 2 03:28:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Apr 2009 23:28:23 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <200904012311.51764.tristan.lefebure@gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com> Message-ID: <29E09DCE622643848EAFA8F1C6210711@NewLife> Tristan-- Strange: it looks like the prints to stderr have been commented out in the source (back in revision 10242; 1.6 is rev 15582). The two statements are easy to find in the SimpleAlign.pm uniq_seq() source; you can uncomment them to work around this. You are right, this is rather an unconventional way to specify an output option-- can Chris comment? Mark ----- Original Message ----- From: "Tristan Lefebure" To: "BioPerl List" Sent: Wednesday, April 01, 2009 11:11 PM Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Hi there, > > I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. > Here is the description: > > Title : uniq_seq > Usage : $aln->uniq_seq(): Remove identical sequences in > in the alignment. Ambiguous base ("N", "n") and > leading and ending gaps ("-") are NOT counted as > differences. > Function : Make a new alignment of unique sequence types (STs) > Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST") > 2. ST of each sequence in STDERR > Argument : None > > What I'm trying to obtain is the ST composition (i.e. what is supposed to go > to STDERR), but I see nothing... > > An example: > > --------test.fasta: >>seq1 > AAATTTC >>seq2 > CAATTTC >>seq3 > AAATTTC > ------- > > > ----------test.pl: > #! /usr/bin/perl > > use strict; > use warnings; > use Bio::AlignIO; > use Bio::SimpleAlign; > use Getopt::Long; > > my $in = Bio::AlignIO->new(-file => 'test.fasta' , > -format => 'fasta'); > > my $out = Bio::AlignIO->new(-file => ">test.out" , > -format => 'fasta'); > > while ( my $aln = $in->next_aln() ) { > my $red_aln = $aln->uniq_seq; > $out->write_aln($red_aln); > } > ------------- > > If you run: > > ./test.pl &> log > > you will get nothing written into the log file... (but the test.out is OK) > > Am I missing something? > By the way, wouldn't it be more convenient to have the ST composition returned > in an array? > > Thanks, > > --Tristan > (BioPerl 1.6) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From weigangq at gmail.com Thu Apr 2 03:57:16 2009 From: weigangq at gmail.com (Weigang Qiu) Date: Wed, 1 Apr 2009 22:57:16 -0500 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <29E09DCE622643848EAFA8F1C6210711@NewLife> References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> Message-ID: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Mark and Tristan, I am the original instigator of the uniq_seq method. The STDERR implementation was used so that STDOUT could be piped. But it did not conform to bioperl convention of using the $self->debug() method. I think that's why these lines were commented out and re-implemented using the $self->debug method. So, turning on the debug option should give the intended ST mapping for each sequence in stderr. weigang On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > Tristan-- > Strange: it looks like the prints to stderr have been commented out in the > source (back in revision 10242; 1.6 is rev 15582). The > two statements are easy to find in the SimpleAlign.pm uniq_seq() source; > you can > uncomment them to work around this. > You are right, this is rather an unconventional way to specify an output > option-- can Chris comment? > Mark > ----- Original Message ----- From: "Tristan Lefebure" < > tristan.lefebure at gmail.com> > To: "BioPerl List" > Sent: Wednesday, April 01, 2009 11:11 PM > Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Hi there, >> >> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >> Here is the description: >> >> Title : uniq_seq >> Usage : $aln->uniq_seq(): Remove identical sequences in >> in the alignment. Ambiguous base ("N", "n") and >> leading and ending gaps ("-") are NOT counted as >> differences. >> Function : Make a new alignment of unique sequence types (STs) >> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >> "ST") >> 2. ST of each sequence in STDERR >> Argument : None >> >> What I'm trying to obtain is the ST composition (i.e. what is supposed to >> go >> to STDERR), but I see nothing... >> >> An example: >> >> --------test.fasta: >> >>> seq1 >>> >> AAATTTC >> >>> seq2 >>> >> CAATTTC >> >>> seq3 >>> >> AAATTTC >> ------- >> >> >> ----------test.pl: >> #! /usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::AlignIO; >> use Bio::SimpleAlign; >> use Getopt::Long; >> >> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >> -format => 'fasta'); >> >> my $out = Bio::AlignIO->new(-file => ">test.out" , >> -format => 'fasta'); >> >> while ( my $aln = $in->next_aln() ) { >> my $red_aln = $aln->uniq_seq; >> $out->write_aln($red_aln); >> } >> ------------- >> >> If you run: >> >> ./test.pl &> log >> >> you will get nothing written into the log file... (but the test.out is OK) >> >> Am I missing something? >> By the way, wouldn't it be more convenient to have the ST composition >> returned >> in an array? >> >> Thanks, >> >> --Tristan >> (BioPerl 1.6) >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Weigang Qiu Department of Biological Sciences Hunter College, City University of New York 695 Park Avenue New York, NY 10065 From maj at fortinbras.us Thu Apr 2 04:15:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 00:15:06 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks Weigang-- I didn't look carefully enough-- I'll make a change to the POD. so Tristan, in your code below, add $aln->verbose(1); before you invoke uniq_seq(). The ST's should then be sent to stderr (as "warns"). MAJ ----- Original Message ----- From: "Weigang Qiu" To: "Mark A. Jensen" Cc: "BioPerl List" ; Sent: Wednesday, April 01, 2009 11:57 PM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Mark and Tristan, > > I am the original instigator of the uniq_seq method. The STDERR > implementation was used so that STDOUT could be piped. But it did not > conform to bioperl convention of using the $self->debug() method. I think > that's why these lines were commented out and re-implemented using the > $self->debug method. So, turning on the debug option should give the > intended ST mapping for each sequence in stderr. > > weigang > > On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen wrote: > >> Tristan-- >> Strange: it looks like the prints to stderr have been commented out in the >> source (back in revision 10242; 1.6 is rev 15582). The >> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >> you can >> uncomment them to work around this. >> You are right, this is rather an unconventional way to specify an output >> option-- can Chris comment? >> Mark >> ----- Original Message ----- From: "Tristan Lefebure" < >> tristan.lefebure at gmail.com> >> To: "BioPerl List" >> Sent: Wednesday, April 01, 2009 11:11 PM >> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Hi there, >>> >>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign module. >>> Here is the description: >>> >>> Title : uniq_seq >>> Usage : $aln->uniq_seq(): Remove identical sequences in >>> in the alignment. Ambiguous base ("N", "n") and >>> leading and ending gaps ("-") are NOT counted as >>> differences. >>> Function : Make a new alignment of unique sequence types (STs) >>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>> "ST") >>> 2. ST of each sequence in STDERR >>> Argument : None >>> >>> What I'm trying to obtain is the ST composition (i.e. what is supposed to >>> go >>> to STDERR), but I see nothing... >>> >>> An example: >>> >>> --------test.fasta: >>> >>>> seq1 >>>> >>> AAATTTC >>> >>>> seq2 >>>> >>> CAATTTC >>> >>>> seq3 >>>> >>> AAATTTC >>> ------- >>> >>> >>> ----------test.pl: >>> #! /usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use Bio::AlignIO; >>> use Bio::SimpleAlign; >>> use Getopt::Long; >>> >>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>> -format => 'fasta'); >>> >>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>> -format => 'fasta'); >>> >>> while ( my $aln = $in->next_aln() ) { >>> my $red_aln = $aln->uniq_seq; >>> $out->write_aln($red_aln); >>> } >>> ------------- >>> >>> If you run: >>> >>> ./test.pl &> log >>> >>> you will get nothing written into the log file... (but the test.out is OK) >>> >>> Am I missing something? >>> By the way, wouldn't it be more convenient to have the ST composition >>> returned >>> in an array? >>> >>> Thanks, >>> >>> --Tristan >>> (BioPerl 1.6) >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Weigang Qiu > Department of Biological Sciences > Hunter College, City University of New York > 695 Park Avenue > New York, NY 10065 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From miguel.pignatelli at uv.es Thu Apr 2 08:17:02 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 02 Apr 2009 10:17:02 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <49D4747E.4060001@uv.es> You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB->get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = taxDB->get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB->get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB->get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB->get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB->get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin You may consult the documentation of the module for a full description. A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild->new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each GI in your blast result: my $superkingdom = $taxDB->get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem while using it don't hesitate to contact me. Hope this helps, M; Florent Angly wrote: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you > won't be able to put its information in a hash (unless you have a lot of > memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from >> ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should be >> fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the nodes.dmp >> file contained within lists what division each tax_id belongs to >> (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can >> probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E >> russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 >> 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to parse >>> a blast >>> report (refseq blast) and check how may organisms are eukaryotes and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Apr 2 12:29:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 08:29:47 -0400 Subject: [Bioperl-l] FYI: note on wiki template behavior Message-ID: <62B28D02BEA44E13BBDB5531FF6D67CF@NewLife> Wiki-interested folks- I fixed a "feature" in the HOWTO template-- When the template was used twice in the same line of text, the text following the first instance was rendered as a "code box". This had to do with how the template itself was formatted. If you're interested, please have a look at http://www.bioperl.org/wiki/Template_talk:HOWTO cheers, Mark From tristan.lefebure at gmail.com Thu Apr 2 13:30:51 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 2 Apr 2009 09:30:51 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com> <29E09DCE622643848EAFA8F1C6210711@NewLife> <7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Thanks you both, To internally store the ST composition, so that I can reuse it in the same script, I made the following modifications to SimpleAlign.pm: diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm 590a591,592 > #modified to also returned an array of the ST composition > my %st; 651a654 > push @{$st{$order{$str}}}, $_->id(); 655c658 < return $aln; --- > return ($aln, %st); This is probably not really BioPerl compliant. Being an OBO ignorant, I wonder if we could add this information somewhere either once in the $aln object, or by little pieces in each Bio::LocatableSeq objects? Thks, --Tristan On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > Thanks Weigang-- I didn't look carefully enough-- > I'll make a change to the POD. > so Tristan, in your code below, add > > $aln->verbose(1); > > before you invoke uniq_seq(). The ST's should > then be sent to stderr (as "warns"). > > MAJ > ----- Original Message ----- From: "Weigang Qiu" > To: "Mark A. Jensen" > Cc: "BioPerl List" ; < > tristan.lefebure at gmail.com> > Sent: Wednesday, April 01, 2009 11:57 PM > Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > > > > Mark and Tristan, >> >> I am the original instigator of the uniq_seq method. The STDERR >> implementation was used so that STDOUT could be piped. But it did not >> conform to bioperl convention of using the $self->debug() method. I think >> that's why these lines were commented out and re-implemented using the >> $self->debug method. So, turning on the debug option should give the >> intended ST mapping for each sequence in stderr. >> >> weigang >> >> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >> wrote: >> >> Tristan-- >>> Strange: it looks like the prints to stderr have been commented out in >>> the >>> source (back in revision 10242; 1.6 is rev 15582). The >>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>> you can >>> uncomment them to work around this. >>> You are right, this is rather an unconventional way to specify an output >>> option-- can Chris comment? >>> Mark >>> ----- Original Message ----- From: "Tristan Lefebure" < >>> tristan.lefebure at gmail.com> >>> To: "BioPerl List" >>> Sent: Wednesday, April 01, 2009 11:11 PM >>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>> >>> >>> >>> Hi there, >>> >>>> >>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>> module. >>>> Here is the description: >>>> >>>> Title : uniq_seq >>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>> in the alignment. Ambiguous base ("N", "n") and >>>> leading and ending gaps ("-") are NOT counted as >>>> differences. >>>> Function : Make a new alignment of unique sequence types (STs) >>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>> "ST") >>>> 2. ST of each sequence in STDERR >>>> Argument : None >>>> >>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>> to >>>> go >>>> to STDERR), but I see nothing... >>>> >>>> An example: >>>> >>>> --------test.fasta: >>>> >>>> seq1 >>>>> >>>>> AAATTTC >>>> >>>> seq2 >>>>> >>>>> CAATTTC >>>> >>>> seq3 >>>>> >>>>> AAATTTC >>>> ------- >>>> >>>> >>>> ----------test.pl: >>>> #! /usr/bin/perl >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::AlignIO; >>>> use Bio::SimpleAlign; >>>> use Getopt::Long; >>>> >>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>> -format => 'fasta'); >>>> >>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>> -format => 'fasta'); >>>> >>>> while ( my $aln = $in->next_aln() ) { >>>> my $red_aln = $aln->uniq_seq; >>>> $out->write_aln($red_aln); >>>> } >>>> ------------- >>>> >>>> If you run: >>>> >>>> ./test.pl &> log >>>> >>>> you will get nothing written into the log file... (but the test.out is >>>> OK) >>>> >>>> Am I missing something? >>>> By the way, wouldn't it be more convenient to have the ST composition >>>> returned >>>> in an array? >>>> >>>> Thanks, >>>> >>>> --Tristan >>>> (BioPerl 1.6) >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Weigang Qiu >> Department of Biological Sciences >> Hunter College, City University of New York >> 695 Park Avenue >> New York, NY 10065 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> From dereje1227 at yahoo.com Thu Apr 2 13:45:08 2009 From: dereje1227 at yahoo.com (demis001) Date: Thu, 2 Apr 2009 06:45:08 -0700 (PDT) Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 Message-ID: <22816585.post@talk.nabble.com> Hi , I am new to BioPerl and this forum and even do not know how to post the new post. I have one question for you guys. Is there any BioPerl module that allows me to download sequence based on chromosome name, seqStart and SeqEnd given the formatted human genome database downloaded on my Linux desktop? I used to do this using Perl $URI object and it is really slow as the process depend on the network. To be more specific, I took chrName, seqStart and seqEnd and go to Ensembl database to get the sequence one by one using Perl $URI object. I thought it might be easier if I process locally using indexed database using BioPerl module if there is any designed for this purpose. Input, millions rows of tab delimited (CSV) file contain information about chrName, seqStart, seqEnd. Locally formatted/indexed human genome. Output should be the fasta sequence contain the sequence and with the header contain chr name and location persed Sorry if I posted in the wrong section of the forum and happy to get any recommendation. Thanks Govind Chandra wrote: > > Hi, > > The code below > > > ====== code begins ======= > #use strict; > use Bio::SeqIO; > > $infile='NC_000913.gbk'; > my $seqio=Bio::SeqIO->new(-file => $infile); > my $seqobj=$seqio->next_seq(); > my @features=$seqobj->all_SeqFeatures(); > my $count=0; > foreach my $feature (@features) { > unless($feature->primary_tag() eq 'CDS') {next;} > print($feature->start()," ", $feature->end(), " > ",$feature->strand(),"\n"); > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); > @temp2=$ac->get_Annotations(); > print("$temp1 $temp2[0] @temp2\n"); > if($count++ > 5) {last;} > } > > print(ref($ac),"\n"); > exit; > > ======= code ends ======== > > produces the output > > ========== output begins ======== > > 190 255 1 > 0 > 337 2799 1 > 0 > 2801 3733 1 > 0 > 3734 5020 1 > 0 > 5234 5530 1 > 0 > 5683 6459 -1 > 0 > 6529 7959 -1 > 0 > Bio::Annotation::Collection > > =========== output ends ========== > > $ac is-a Bio::Annotation::Collection but does not actually contain any > annotation from the feature. Is this how it should be? I cannot figure > out what is wrong with the script. Earlier I used to use has_tag(), > get_tag_values() etc. but the documentation says these are deprecated. > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > -a is > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Thanks in advance for any help. > > Govind > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Thu Apr 2 13:46:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 2 Apr 2009 09:46:36 -0400 Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq In-Reply-To: References: <200904012311.51764.tristan.lefebure@gmail.com><29E09DCE622643848EAFA8F1C6210711@NewLife><7ae9c2740904012057w7e323ddem1a7be78750d38cba@mail.gmail.com> Message-ID: Hi Tristan-- I think this is a good thought, Can you register this as an enhancement at http://bugzilla.bioperl.org ? Please go ahead and attach the diff as a patch to the 'bug' report-- thanks for *your* input- cheers, Mark ----- Original Message ----- From: "Tristan Lefebure" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Weigang Qiu" Sent: Thursday, April 02, 2009 9:30 AM Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq > Thanks you both, > > To internally store the ST composition, so that I can reuse it in the same > script, I made the following modifications to SimpleAlign.pm: > > diff /usr/local/share/perl/5.10.0/Bio/SimpleAlign.pm > /usr/local/share/perl/5.10.0/Bio/SimpleAlignMod.pm > 590a591,592 >> #modified to also returned an array of the ST composition >> my %st; > 651a654 >> push @{$st{$order{$str}}}, $_->id(); > 655c658 > < return $aln; > --- >> return ($aln, %st); > > This is probably not really BioPerl compliant. Being an OBO ignorant, I > wonder if we could add this information somewhere either once in the $aln > object, or by little pieces in each Bio::LocatableSeq objects? > > Thks, > > --Tristan > > On Thu, Apr 2, 2009 at 12:15 AM, Mark A. Jensen wrote: > >> Thanks Weigang-- I didn't look carefully enough-- >> I'll make a change to the POD. >> so Tristan, in your code below, add >> >> $aln->verbose(1); >> >> before you invoke uniq_seq(). The ST's should >> then be sent to stderr (as "warns"). >> >> MAJ >> ----- Original Message ----- From: "Weigang Qiu" >> To: "Mark A. Jensen" >> Cc: "BioPerl List" ; < >> tristan.lefebure at gmail.com> >> Sent: Wednesday, April 01, 2009 11:57 PM >> Subject: Re: [Bioperl-l] Bio::SimpleAlign, uniq_seq >> >> >> >> Mark and Tristan, >>> >>> I am the original instigator of the uniq_seq method. The STDERR >>> implementation was used so that STDOUT could be piped. But it did not >>> conform to bioperl convention of using the $self->debug() method. I think >>> that's why these lines were commented out and re-implemented using the >>> $self->debug method. So, turning on the debug option should give the >>> intended ST mapping for each sequence in stderr. >>> >>> weigang >>> >>> On Wed, Apr 1, 2009 at 10:28 PM, Mark A. Jensen >>> wrote: >>> >>> Tristan-- >>>> Strange: it looks like the prints to stderr have been commented out in >>>> the >>>> source (back in revision 10242; 1.6 is rev 15582). The >>>> two statements are easy to find in the SimpleAlign.pm uniq_seq() source; >>>> you can >>>> uncomment them to work around this. >>>> You are right, this is rather an unconventional way to specify an output >>>> option-- can Chris comment? >>>> Mark >>>> ----- Original Message ----- From: "Tristan Lefebure" < >>>> tristan.lefebure at gmail.com> >>>> To: "BioPerl List" >>>> Sent: Wednesday, April 01, 2009 11:11 PM >>>> Subject: [Bioperl-l] Bio::SimpleAlign, uniq_seq >>>> >>>> >>>> >>>> Hi there, >>>> >>>>> >>>>> I'm trying to use the uniq_seq function from the Bio::SimpleAlign >>>>> module. >>>>> Here is the description: >>>>> >>>>> Title : uniq_seq >>>>> Usage : $aln->uniq_seq(): Remove identical sequences in >>>>> in the alignment. Ambiguous base ("N", "n") and >>>>> leading and ending gaps ("-") are NOT counted as >>>>> differences. >>>>> Function : Make a new alignment of unique sequence types (STs) >>>>> Returns : 1. a new Bio::SimpleAlign object (all sequences renamed as >>>>> "ST") >>>>> 2. ST of each sequence in STDERR >>>>> Argument : None >>>>> >>>>> What I'm trying to obtain is the ST composition (i.e. what is supposed >>>>> to >>>>> go >>>>> to STDERR), but I see nothing... >>>>> >>>>> An example: >>>>> >>>>> --------test.fasta: >>>>> >>>>> seq1 >>>>>> >>>>>> AAATTTC >>>>> >>>>> seq2 >>>>>> >>>>>> CAATTTC >>>>> >>>>> seq3 >>>>>> >>>>>> AAATTTC >>>>> ------- >>>>> >>>>> >>>>> ----------test.pl: >>>>> #! /usr/bin/perl >>>>> >>>>> use strict; >>>>> use warnings; >>>>> use Bio::AlignIO; >>>>> use Bio::SimpleAlign; >>>>> use Getopt::Long; >>>>> >>>>> my $in = Bio::AlignIO->new(-file => 'test.fasta' , >>>>> -format => 'fasta'); >>>>> >>>>> my $out = Bio::AlignIO->new(-file => ">test.out" , >>>>> -format => 'fasta'); >>>>> >>>>> while ( my $aln = $in->next_aln() ) { >>>>> my $red_aln = $aln->uniq_seq; >>>>> $out->write_aln($red_aln); >>>>> } >>>>> ------------- >>>>> >>>>> If you run: >>>>> >>>>> ./test.pl &> log >>>>> >>>>> you will get nothing written into the log file... (but the test.out is >>>>> OK) >>>>> >>>>> Am I missing something? >>>>> By the way, wouldn't it be more convenient to have the ST composition >>>>> returned >>>>> in an array? >>>>> >>>>> Thanks, >>>>> >>>>> --Tristan >>>>> (BioPerl 1.6) >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> -- >>> Weigang Qiu >>> Department of Biological Sciences >>> Hunter College, City University of New York >>> 695 Park Avenue >>> New York, NY 10065 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Wed Apr 1 12:00:59 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 01 Apr 2009 13:00:59 +0100 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> Message-ID: <49D3577B.1090409@sendu.me.uk> Smithies, Russell wrote: > The taxonomy information isn't in the blast output unless you created > custom fasta headers for your blast database. The easiest way to get > the tax_id for your accessions would be to download the gi->tax_id > list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. > If you load that file into a hash, parse the accessions out of the > blast hits then lookup the tax_id from that hash, I think it should > be fairly fast. > > Checking which are prokaryotes and which are eukaryotes based on > tax_id is a separate problem :-) If you grab the taxdump.tar.gz file > from the same site, the nodes.dmp file contained within lists what > division each tax_id belongs to (Bacteria, Invertebrates, Mammals, > Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. From shalabh.sharma7 at gmail.com Thu Apr 2 19:50:58 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 2 Apr 2009 15:50:58 -0400 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D3577B.1090409@sendu.me.uk> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> Message-ID: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala wrote: > Smithies, Russell wrote: > >> The taxonomy information isn't in the blast output unless you created >> custom fasta headers for your blast database. The easiest way to get >> the tax_id for your accessions would be to download the gi->tax_id >> list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If >> you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) If you grab the taxdump.tar.gz file >> from the same site, the nodes.dmp file contained within lists what >> division each tax_id belongs to (Bacteria, Invertebrates, Mammals, >> Phages, Plants, etc) so you can probably work it out from that. >> > > Check out the synopsis for Bio::Taxon > http://doc.bioperl.org/bioperl-live/Bio/Taxon.html > > If the division() function doesn't tell you what you need, you could use > get_lineage_nodes() and check the oldest ancestors to see if its a pro > or euk. > From Russell.Smithies at agresearch.co.nz Thu Apr 2 19:55:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 08:55:06 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> We're here to help - unless it's to do your homework ;-) --Russell From: shalabh sharma [mailto:shalabh.sharma7 at gmail.com] Sent: Friday, 3 April 2009 8:51 a.m. To: Sendu Bala Cc: Smithies, Russell; bioperl-l Subject: Re: [Bioperl-l] taxonomy ID thanks a lot everyone, the information is really useful and it solved my purpose. Thanks Shalabh On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala > wrote: Smithies, Russell wrote: The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. Check out the synopsis for Bio::Taxon http://doc.bioperl.org/bioperl-live/Bio/Taxon.html If the division() function doesn't tell you what you need, you could use get_lineage_nodes() and check the oldest ancestors to see if its a pro or euk. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Fri Apr 3 00:46:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 13:46:39 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> I'm re-formatting some blast output into nice html webpages but am finding $self->end_report() and $self->footer() don't seem to be working. The other methods ($self->start_report, $self->introduction, $self->title) all work fine. Am I doing something wrong or is there a trick to it? Here's some test code: ================================== #!perl -w use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use CGI qw(:standard); my $in = Bio::SearchIO->new(-format => "blast",-file => shift @ARGV, ); my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); $index->start_report( \&my_start_report ); $index->title( \&my_title ); $index->footer(\&my_footer); $index->end_report(\&my_end_report); my $out = Bio::SearchIO->new(-writer => $index, -file => ">blast.htm"); $out->write_result($in->next_result); sub my_start_report{ return h1('this is my header'); } sub my_title{ return h1('this is my title'); } sub my_footer{ my ($self) = @_; return h2('this is a footer'); } sub my_end_report { return h2('this is the end'); } ================================= Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Fri Apr 3 01:09:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 2 Apr 2009 18:09:20 -0700 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> Message-ID: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> looking at the code - it doesn't seem to accept resetting the default value. sub end_report { return "\n\n"; } sub footer { my ($self) = @_; return "
Produced by Bioperl module ".ref($self)." on $DATE
Revision: $Revision
\n" } So just adjusting it to mirror what is happening for title and the rest would be necessary. -jason On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > I'm re-formatting some blast output into nice html webpages but am > finding $self->end_report() and $self->footer() don't seem to be > working. > The other methods ($self->start_report, $self->introduction, $self- > >title) all work fine. > Am I doing something wrong or is there a trick to it? > > Here's some test code: > ================================== > > #!perl -w > > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > use CGI qw(:standard); > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > @ARGV, ); > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > $index->start_report( \&my_start_report ); > $index->title( \&my_title ); > $index->footer(\&my_footer); > $index->end_report(\&my_end_report); > > my $out = Bio::SearchIO->new(-writer => $index, -file => > ">blast.htm"); > > $out->write_result($in->next_result); > > > sub my_start_report{ > return h1('this is my header'); > } > > sub my_title{ > return h1('this is my title'); > } > > sub my_footer{ > my ($self) = @_; > return h2('this is a footer'); > } > > sub my_end_report { > return h2('this is the end'); > } > > ================================= > > Thanx, > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Fri Apr 3 02:16:34 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 3 Apr 2009 15:16:34 +1300 Subject: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? In-Reply-To: <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D3577B.1090409@sendu.me.uk> <9fcc48c70904021250h6fd4a00bu18b7af936813114@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32493ABEBA4@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493ABED7F@exchsth.agresearch.co.nz> <4CB4E9C4-8CF7-4088-8B9C-B615EE192E84@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABEE2E@exchsth.agresearch.co.nz> Not wanting to be picky... But $result_>database_name (for blast results) returns the description of the database rather than just the name. Eg. "hs.fna (Human mRNA Refseqs)" instead of "hs.fna" I've had a hunt but can't see where the code for getting the database_name is. Any ideas? Thanx, --Russell > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > Stajich > Sent: Friday, 3 April 2009 2:09 p.m. > To: Smithies, Russell > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] bug in Bio::SearchIO::Writer::HTMLResultWriter ? > > looking at the code - it doesn't seem to accept resetting the default > value. > sub end_report { > return "\n\n"; > } > > sub footer { > my ($self) = @_; > return "
Produced by Bioperl module ".ref($self)." on > $DATE
Revision: $Revision
\n" > > } > > So just adjusting it to mirror what is happening for title and the > rest would be necessary. > > -jason > On Apr 2, 2009, at 5:46 PM, Smithies, Russell wrote: > > > I'm re-formatting some blast output into nice html webpages but am > > finding $self->end_report() and $self->footer() don't seem to be > > working. > > The other methods ($self->start_report, $self->introduction, $self- > > >title) all work fine. > > Am I doing something wrong or is there a trick to it? > > > > Here's some test code: > > ================================== > > > > #!perl -w > > > > use Bio::SearchIO; > > use Bio::SearchIO::Writer::HTMLResultWriter; > > use CGI qw(:standard); > > > > > > my $in = Bio::SearchIO->new(-format => "blast",-file => shift > > @ARGV, ); > > > > my $index = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > > > $index->start_report( \&my_start_report ); > > $index->title( \&my_title ); > > $index->footer(\&my_footer); > > $index->end_report(\&my_end_report); > > > > my $out = Bio::SearchIO->new(-writer => $index, -file => > > ">blast.htm"); > > > > $out->write_result($in->next_result); > > > > > > sub my_start_report{ > > return h1('this is my header'); > > } > > > > sub my_title{ > > return h1('this is my title'); > > } > > > > sub my_footer{ > > my ($self) = @_; > > return h2('this is a footer'); > > } > > > > sub my_end_report { > > return h2('this is the end'); > > } > > > > ================================= > > > > Thanx, > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > From bernd.web at gmail.com Fri Apr 3 13:47:23 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 15:47:23 +0200 Subject: [Bioperl-l] AlignIO clustal Message-ID: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Hi, Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a clustalw alignment. At the moment, I cannot update to a newer version, so am not sure this problem still exists. The problem is that the $aln object does not exists when the last sequence in a block contains gaps only. Anybody has seen this or knows a fix? Code and example input follows below. Regards, Bernd use Bio::AlignIO; my $in = Bio::AlignIO->new(-file => 'test.aln', -format => 'clustalw'); my $out = Bio::AlignIO->new(-file => '>testerr.ALN', -format => 'clustalw'); my $aln = $in->next_aln(); print $aln->length, "\n"; test.aln contains: CLUSTAL W(1.81) multiple sequence alignment QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI 7082547/1-128 ---------ERATNDMLIGP--DWAVN 1_3265048/1-0 --------------------------- 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV 1_3265047/1-0 --------------------------- From bernd.web at gmail.com Fri Apr 3 14:11:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 3 Apr 2009 16:11:44 +0200 Subject: [Bioperl-l] AlignIO clustal In-Reply-To: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> References: <716af09c0904030647t33fc569er90727990f57c874f@mail.gmail.com> Message-ID: <716af09c0904030711l8252943hff489ccb9f720920@mail.gmail.com> Hi, I noticed this issue is not specific to Clustal; it also occurs for Fasta. The "problem" arises in a last check, which is only done on the last sequence; it is still present in the current code (webcvs) in the next_aln code. In fasta.pm: # If $end <= 0, we have either reached the end of # file in <> or we have encountered some other error if ( $end <= 0 ) { undef $aln; return $aln; } In clustalw.pm # not sure if this should be a default option - or we can pass in # an option to do this in the future? --jason stajich # $aln->map_chars('\.','-'); undef $aln if ( !defined $end || $end <= 0 ); return $aln; And the last sequence actually got a zero end. This was given in an $aln->slice where gap only sequences are retained. It will also get a "0" in next_aln itself if no coordinates would be present. 1_3265047/1-0 --------------------------- For now, uncommenting "undef $aln if ( !defined $end || $end <= 0 );" works. Regards, Bernd On Fri, Apr 3, 2009 at 3:47 PM, Bernd Web wrote: > Hi, > > Using Bioperl 1.5.2 and AlignIO, I now run into an issue with a > clustalw alignment. > At the moment, I cannot update to a newer version, so am not sure this > problem still exists. > > The problem is that the $aln object does not exists when the last > sequence in a block contains gaps only. > Anybody has seen this or knows a fix? Code and example input follows below. > > > Regards, > Bernd > > > use Bio::AlignIO; > my $in = Bio::AlignIO->new(-file => 'test.aln', > -format => 'clustalw'); > > my $out = Bio::AlignIO->new(-file => '>testerr.ALN', > -format => 'clustalw'); > > my $aln = $in->next_aln(); > print $aln->length, "\n"; > > test.aln contains: > > CLUSTAL W(1.81) multiple sequence alignment > > > QUERY/7-143 PETLE-ARINRATNPLNKEL--DWASI > 7082547/1-128 ---------ERATNDMLIGP--DWAVN > 1_3265048/1-0 --------------------------- > 3265047/2-138 QTSLE-ALLLKATNSQNQNI--DTAAV > 1_3265047/1-0 --------------------------- > From hlapp at gmx.net Mon Apr 6 15:39:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 6 Apr 2009 11:39:50 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> Message-ID: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> (Removing biosql-l from the cc list as this seems to be a problem with BioPerl.) Hi Johann, I don't know whether anyone has responded to you yet - if not I'm sorry, I've been inundated for the past couple test. On Apr 1, 2009, at 6:14 AM, Johann PELLET wrote: > With the latest version of BioPerl and BioSQL, I have tried to > insert entry from a GenBank file, which I have downloaded from the > NCBI website (648 937 records) Could you be more specific? When you say the latest version of BioPerl, do you mean 1.6.1 or the current svn snapshot of the main trunk? And which Genbank file is it? Is it one with only viruses, i.e., are you specifically interested in the virus sequences that the parser is giving you trouble with? > After successfully loading ncbi_taxonomy i am getting following > error message while loading sequences into database. > > perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg - > dbname biosql > > > --------------------- WARNING --------------------- > MSG: The supplied lineage does not start near 'Human papillomavirus > type 2c' (I was supplied 'Human papillomavirus - 2 | > Alphapapillomavirus | Papillomaviridae') This is a problem in the BioPerl genbank parser, or more specifically, in the species parser. I thought though this was fixed in 1.6.1; are you sure you don't have an older version of BioPerl lying around that could accidentally have been used? That said, it only seems to be a warning; did you check how the record ended up in the database and found it to be incomplete or messed up? > the script is not stopped until this entry: S67864 This a later entry, not the same entry that causes the problem above, right? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, > values were ("1","19)","1","3") FKs (41914,) > ERROR: invalid input syntax for integer: "19)" Oops - that's a problem that must originate from the BioPerl feature location parser. The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 Does anyone see why the location parser should have a problem with the first gene feature? It's nested, and has remote location components, but at first sight nothing jumps out at me as extraordinary. Has someone recently changed the location parsing code? If no-one has an immediate idea what could be at work here, this needs investigating. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From torsten.seemann at infotech.monash.edu.au Tue Apr 7 01:05:25 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 7 Apr 2009 11:05:25 +1000 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: > The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > Does anyone see why the location parser should have a problem with the first > gene feature? It's nested, and has remote location components, but at first > sight nothing jumps out at me as extraordinary. Has someone recently changed > the location parsing code? If no-one has an immediate idea what could be at > work here, this needs investigating. I'm not sure if Bioperl handles the order() operator? For those unfamilair with the order() operator: http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 order(location,location, ... location) The elements can be found in the specified order (5' to 3' direction), but nothing is implied about the reasonableness about joining them. --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Tue Apr 7 03:59:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Apr 2009 22:59:14 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> Message-ID: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 > > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > >> Does anyone see why the location parser should have a problem with >> the first >> gene feature? It's nested, and has remote location components, but >> at first >> sight nothing jumps out at me as extraordinary. Has someone >> recently changed >> the location parsing code? If no-one has an immediate idea what >> could be at >> work here, this needs investigating. The location parsing code was refactored above 3-4 years ago w/o problems. This'll be the first one to crop up. I'll try taking a look at it. > I'm not sure if Bioperl handles the order() operator? > > For those unfamilair with the order() operator: > > http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 > > order(location,location, ... location) > The elements can be found in the specified order (5' to 3' direction), > but nothing is implied about the reasonableness about joining them. > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA It's interesting that the version from eutils differs significantly in the feature table when retrieving 'gb' or 'gbwithparts', the latter resolves the location (see below). Regardless we'll need to make sure this is parseable. .... FEATURES Location/Qualifiers source 1..77 /organism="Ovine respiratory syncytial virus" /mol_type="genomic RNA" /db_xref="taxon:28869" gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) /gene="G" gene 55..>77 /gene="fusion glycoprotein F" chris From cjfields at illinois.edu Tue Apr 7 05:32:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Apr 2009 00:32:52 -0500 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> Message-ID: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Fixed in svn now and have added this as a test case (passes all tests in bioperl-live). For some reason this wasn't catching some more complex combinations of operators, mainly those with mixes of order/ join. chris On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: > >>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >> >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> >>> Does anyone see why the location parser should have a problem with >>> the first >>> gene feature? It's nested, and has remote location components, but >>> at first >>> sight nothing jumps out at me as extraordinary. Has someone >>> recently changed >>> the location parsing code? If no-one has an immediate idea what >>> could be at >>> work here, this needs investigating. > > The location parsing code was refactored above 3-4 years ago w/o > problems. This'll be the first one to crop up. I'll try taking a > look at it. > >> I'm not sure if Bioperl handles the order() operator? >> >> For those unfamilair with the order() operator: >> >> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >> >> order(location,location, ... location) >> The elements can be found in the specified order (5' to 3' >> direction), >> but nothing is implied about the reasonableness about joining them. >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA > > It's interesting that the version from eutils differs significantly > in the feature table when retrieving 'gb' or 'gbwithparts', the > latter resolves the location (see below). Regardless we'll need to > make sure this is parseable. > > .... > > FEATURES Location/Qualifiers > source 1..77 > /organism="Ovine respiratory syncytial virus" > /mol_type="genomic RNA" > /db_xref="taxon:28869" > gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) > /gene="G" > gene 55..>77 > /gene="fusion glycoprotein F" > > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johann.pellet at inserm.fr Tue Apr 7 08:48:56 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Tue, 7 Apr 2009 10:48:56 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <73508372-0C43-4693-8135-45C128A25959@inserm.fr> Thanks all, I will update bioperl-live using svn right now, and I will restart to load sequences into my biosql database. Hilmar, My GenBank file contains only virus sequences. I downloaded it using eutils, (db=nuccore, tool=ebot, rettype=gb ...). Thank you again -- -- Johann Pellet Le 7 avr. 09 ? 07:32, Chris Fields a ?crit : > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Apr 7 17:56:27 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 7 Apr 2009 13:56:27 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Awesome, thanks Chris! $beer_owed++; -hilmar On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > Fixed in svn now and have added this as a test case (passes all > tests in bioperl-live). For some reason this wasn't catching some > more complex combinations of operators, mainly those with mixes of > order/join. > > chris > > On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: > >> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >> >>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772 >>> >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> >>>> Does anyone see why the location parser should have a problem >>>> with the first >>>> gene feature? It's nested, and has remote location components, >>>> but at first >>>> sight nothing jumps out at me as extraordinary. Has someone >>>> recently changed >>>> the location parsing code? If no-one has an immediate idea what >>>> could be at >>>> work here, this needs investigating. >> >> The location parsing code was refactored above 3-4 years ago w/o >> problems. This'll be the first one to crop up. I'll try taking a >> look at it. >> >>> I'm not sure if Bioperl handles the order() operator? >>> >>> For those unfamilair with the order() operator: >>> >>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>> >>> order(location,location, ... location) >>> The elements can be found in the specified order (5' to 3' >>> direction), >>> but nothing is implied about the reasonableness about joining them. >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >> >> It's interesting that the version from eutils differs significantly >> in the feature table when retrieving 'gb' or 'gbwithparts', the >> latter resolves the location (see below). Regardless we'll need to >> make sure this is parseable. >> >> .... >> >> FEATURES Location/Qualifiers >> source 1..77 >> /organism="Ovine respiratory syncytial virus" >> /mol_type="genomic RNA" >> /db_xref="taxon:28869" >> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >> /gene="G" >> gene 55..>77 >> /gene="fusion glycoprotein F" >> >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From juheymann at yahoo.com Tue Apr 7 18:20:04 2009 From: juheymann at yahoo.com (Jurgen Heymann) Date: Tue, 7 Apr 2009 11:20:04 -0700 (PDT) Subject: [Bioperl-l] restriction site map Message-ID: <237420.97841.qm@web54203.mail.re2.yahoo.com> Hi All: I would like to convert a table (restriction enzyme / position where it cuts in gene of interest) into a graphical representation. What avenues exists for that? Would appreciate your comments. Thank you, Jurgen From wenzhiwang1983 at yahoo.com.cn Wed Apr 8 01:39:59 2009 From: wenzhiwang1983 at yahoo.com.cn (Wen-Zhi WANG) Date: Wed, 8 Apr 2009 09:39:59 +0800 (CST) Subject: [Bioperl-l] Pasing Affymatrix Microarray output Message-ID: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Dear all, ? Recently, I focus on population genomics data outputed by affymatrix microarray system. However, softwares which designed by affy. inc only run in Windows 386 platform. Is there any application can used in Linux? Bio::Affymatrix was not strong enough to get the detailed informaton. ? Thank you a lot. ? Yours, WWZ ___________________________________________________________________ ? Wen-Zhi WANG State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel:??????(86) 871-5198993 Fax:???? (86) 871-5195430 Mobile: 13759114244 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From Russell.Smithies at agresearch.co.nz Wed Apr 8 01:58:54 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 8 Apr 2009 13:58:54 +1200 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493ABF94C@exchsth.agresearch.co.nz> Have you had a look at Microarray-GeneXplorer http://search.cpan.org/~sherlock/Microarray-GeneXplorer-0.11/ I haven't used it but I'd expect it to be pretty good being from Gavin Sherlock :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Wen-Zhi WANG > Sent: Wednesday, 8 April 2009 1:40 p.m. > To: BioPerl List > Subject: [Bioperl-l] Pasing Affymatrix Microarray output > > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run in > Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > > Thank you a lot. > > Yours, > WWZ > ___________________________________________________________________ > > Wen-Zhi WANG > > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel:??????(86) 871-5198993 > Fax:???? (86) 871-5195430 > Mobile: 13759114244 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sdavis2 at mail.nih.gov Wed Apr 8 02:10:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 7 Apr 2009 22:10:17 -0400 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <992233.10677.qm@web15208.mail.cnb.yahoo.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> Message-ID: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > Dear all, > > Recently, I focus on population genomics data outputed by affymatrix > microarray system. However, softwares which designed by affy. inc only run > in Windows 386 platform. Is there any application can used in Linux? > Bio::Affymatrix was not strong enough to get the detailed informaton. > You may want to look at a non-bioperl solution such as Bioconductor ( http://bioconductor.org). Sean From sac at bioperl.org Wed Apr 8 05:59:49 2009 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 7 Apr 2009 22:59:49 -0700 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> Message-ID: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Check out our Affymetrix Power Tools (APT) package: http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx We distribute binaries for Linux and Mac OSX, as well as source code so you can compile it yourself if you want. Note however that this is written in C++, not Perl. We don't provide SWIG or XS interfaces for direct access via Perl, though this would definitely be doable, if anyone is interested. Probably the easiest approach from Perl would be to simply call the appropriate APT executable through the shell as in: system("/path/to/apt --args ..."); The Perl code can parse the output files and take it from there. Steve On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis wrote: > On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG wrote: > >> Dear all, >> >> Recently, I focus on population genomics data outputed by affymatrix >> microarray system. However, softwares which designed by affy. inc only run >> in Windows 386 platform. Is there any application can used in Linux? >> Bio::Affymatrix was not strong enough to get the detailed informaton. >> > > You may want to look at a non-bioperl solution such as Bioconductor ( > http://bioconductor.org). > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Wed Apr 8 14:07:17 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 8 Apr 2009 07:07:17 -0700 (PDT) Subject: [Bioperl-l] Access Uniprot detailed information Message-ID: <22951210.post@talk.nabble.com> Hi there, maybe I am not able to read careful enough through the Howto section. But is there a function in BioPerl that retrieves for a given Uniprot Access Code or ID from the Uniprot Database some general annotations like enzymatic activity or literature references? I appreciate any help! -- View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From johann.pellet at inserm.fr Wed Apr 8 15:29:29 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Wed, 8 Apr 2009 17:29:29 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: Hie all, I confirm that now it's ok for the LOCUS S67862S3 since Chris update. Thanks again. However I still have Warning message with other entries like: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93MP8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Hantaanvirus CGRn93P8' (I was supplied 'Hantaan virus | Hantavirus | Bunyaviridae') --------------------------------------------------- ######################################################################################################################### but entries are inserted in the biosql database: ######################################################################################################################### biosql=# select * from bioentry where description like 'Hantaanvirus CGRn93P8%'; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+----------+----------- +------------+---------- + ----------------------------------------------------------------------- +--------- 156282 | 84 | 395824 | EF990932 | EF990932 | 156144486 | VRL | Hantaanvirus CGRn93P8 RNA-dependent RNA polymerase gene, partial cds. | 1 156288 | 84 | 395824 | EF990918 | EF990918 | 154623008 | VRL | Hantaanvirus CGRn93P8 segment M, complete sequence. | 1 156294 | 84 | 395824 | EF990904 | EF990904 | 154622980 | VRL | Hantaanvirus CGRn93P8 segment S, complete sequence. | 1 (3 rows) ######################################################################################################################### and finally EU608407 and EU608559 made a crash: ######################################################################################################################### --------------------- WARNING --------------------- MSG: The supplied lineage does not start near 'Fowl adenovirus 8' (I was supplied 'Fowl adenovirus E | Aviadenovirus | Adenoviridae') --------------------------------------------------- --------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- #######...14 times ...############ --------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008","","","CRC- D35248959C54B9F2","1","1212","") FKs () ERROR: null value in column "location" violates not-null constraint --------------------------------------------------- Could not store EU608559: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children / Library/Perl/5.8.8/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:230 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /Library/Perl/5.8.8/ Bio/DB/BioSQL/SeqAdaptor.pm:237 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.8/Bio/DB/Persistent/PersistentObject.pm:284 STACK: load_seqdatabase.pl:630 ----------------------------------------------------------- at load_seqdatabase.pl line 643 ######################################################################################################################### If I check in the biosql database if some part of this records are inserted: ######################################################################################################################### select * from reference where title='Evidence for positive epistasis in HIV-1'; reference_id | dbxref_id | location | title | authors | crc --------------+-----------+-------------------------------------- +------------------------------------------ + ----------------------------------------------------------------------------+ ---------------------- 16443 | 4179 | Science 306 (5701), 1547-1550 (2004) | Evidence for positive epistasis in HIV-1 | Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. | CRC-19E7AA4FB7A5D4AF (1 row) select * from dbxref where dbxref_id=4179; dbxref_id | dbname | accession | version -----------+--------+-----------+--------- 4179 | PUBMED | 15567861 | 0 select * from bioentry where accession=15567861; bioentry_id | biodatabase_id | taxon_id | name | accession | identifier | division | description | version -------------+----------------+----------+------+----------- +------------+----------+-------------+--------- (0 rows) ######################################################################################################################### I don't have records with name='EU608407' or 'EU608559' in the bioentry table. Thanks for your help Johann -- -- Johann Pellet Le 7 avr. 09 ? 19:56, Hilmar Lapp a ?crit : > Awesome, thanks Chris! $beer_owed++; > > -hilmar > > On Apr 7, 2009, at 1:32 AM, Chris Fields wrote: > >> Fixed in svn now and have added this as a test case (passes all >> tests in bioperl-live). For some reason this wasn't catching some >> more complex combinations of operators, mainly those with mixes of >> order/join. >> >> chris >> >> On Apr 6, 2009, at 10:59 PM, Chris Fields wrote: >> >>> On Apr 6, 2009, at 8:05 PM, Torsten Seemann wrote: >>> >>>>> The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/ >>>>> 544772 >>>> >>>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>>> >>>>> Does anyone see why the location parser should have a problem >>>>> with the first >>>>> gene feature? It's nested, and has remote location components, >>>>> but at first >>>>> sight nothing jumps out at me as extraordinary. Has someone >>>>> recently changed >>>>> the location parsing code? If no-one has an immediate idea what >>>>> could be at >>>>> work here, this needs investigating. >>> >>> The location parsing code was refactored above 3-4 years ago w/o >>> problems. This'll be the first one to crop up. I'll try taking a >>> look at it. >>> >>>> I'm not sure if Bioperl handles the order() operator? >>>> >>>> For those unfamilair with the order() operator: >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/#3.5.2 >>>> >>>> order(location,location, ... location) >>>> The elements can be found in the specified order (5' to 3' >>>> direction), >>>> but nothing is implied about the reasonableness about joining them. >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>> >>> It's interesting that the version from eutils differs >>> significantly in the feature table when retrieving 'gb' or >>> 'gbwithparts', the latter resolves the location (see below). >>> Regardless we'll need to make sure this is parseable. >>> >>> .... >>> >>> FEATURES Location/Qualifiers >>> source 1..77 >>> /organism="Ovine respiratory syncytial virus" >>> /mol_type="genomic RNA" >>> /db_xref="taxon:28869" >>> gene order(S67862.1:72..75,join(S67863.1:1..788,1..19)) >>> /gene="G" >>> gene 55..>77 >>> /gene="fusion glycoprotein F" >>> >>> >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cgoddard at flmnh.ufl.edu Wed Apr 8 15:25:37 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 11:25:37 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db Message-ID: <49DCC1F1.6080601@flmnh.ufl.edu> I am running into problems when trying to insert a sequence object retrieved from GenBank into a BioSQL schema running in a Postgres database. Whenever I use the 'create()' method on the sequence that has been made into a persistent object, the sequence isn't saved into the database properly. No error messages are given, and the corresponding Postgres primary key sequences are incremented as if the data had been saved properly: the appropriate tables themselves remain empty though. I am completely new to using the biosql-db modules, and so am probably missing something pretty simple. Below you will see the basic code that causes the problem. my $genbank_id = 'AYXXXXXX' my $genDB = new Bio::DB::GenBank; $sequence = $genDB->get_Seq_by_id($genbank_id); my $db = Bio::DB::BioDB->new(-database => 'biosql', -user => 'username', -dbname => 'dbname', -host => 'localhost', -driver => 'Pg'); my $pobj = $db->create_persistent($sequence); $pobj->create(); I am running the latest svn trunk versions of bioperl and bioperl-db (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI taxonomy info using the script included in the BioSQL package, and that data seemed to install without error. Any help or advice would be greatly appreciated. Thanks, Chris Goddard From hlapp at gmx.net Wed Apr 8 16:21:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:21:11 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCC1F1.6080601@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> Message-ID: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> This all sounds like you aren't issuing commit. Are you sure your code contains $popj->commit() and what you are looking at is *after* that is executed? Bioperl-db is transactional, so you decide when to commit (or rollback). -hilmar On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > I am running into problems when trying to insert a sequence object > retrieved from GenBank into a BioSQL schema running in a Postgres > database. Whenever I use the 'create()' method on the sequence that > has been made into a persistent object, the sequence isn't saved > into the database properly. No error messages are given, and the > corresponding Postgres primary key sequences are incremented as if > the data had been saved properly: the appropriate tables themselves > remain empty though. > > I am completely new to using the biosql-db modules, and so am > probably missing something pretty simple. Below you will see the > basic code that causes the problem. > > my $genbank_id = 'AYXXXXXX' > > my $genDB = new Bio::DB::GenBank; > $sequence = $genDB->get_Seq_by_id($genbank_id); > > my $db = Bio::DB::BioDB->new(-database => 'biosql', > -user => 'username', > -dbname => 'dbname', > -host => 'localhost', > -driver => 'Pg'); > > my $pobj = $db->create_persistent($sequence); > $pobj->create(); > > I am running the latest svn trunk versions of bioperl and bioperl-db > (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI > taxonomy info using the script included in the BioSQL package, and > that data seemed to install without error. Any help or advice would > be greatly appreciated. > > Thanks, > Chris Goddard > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Apr 8 16:40:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 12:40:53 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <49DCD120.8020302@flmnh.ufl.edu> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> <49DCD120.8020302@flmnh.ufl.edu> Message-ID: <4A6EA2F3-BA88-474E-A9D9-C1A7444CA755@gmx.net> On Apr 8, 2009, at 12:30 PM, Chris Goddard wrote: > That was it. I guess I just incorrectly assumed that create() did > an auto-commit. That was simple to fix. Thank you! > No problem, I'm glad I could be helpful! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cgoddard at flmnh.ufl.edu Wed Apr 8 16:30:24 2009 From: cgoddard at flmnh.ufl.edu (Chris Goddard) Date: Wed, 08 Apr 2009 12:30:24 -0400 Subject: [Bioperl-l] bioperl-db - Problems when trying to insert GenBank sequence into Pg BioSQL db In-Reply-To: <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> References: <49DCC1F1.6080601@flmnh.ufl.edu> <2E751C39-9475-4746-B3A3-5D5F552E9197@gmx.net> Message-ID: <49DCD120.8020302@flmnh.ufl.edu> That was it. I guess I just incorrectly assumed that create() did an auto-commit. That was simple to fix. Thank you! Chris Hilmar Lapp wrote: > This all sounds like you aren't issuing commit. Are you sure your code > contains $popj->commit() and what you are looking at is *after* that > is executed? > > Bioperl-db is transactional, so you decide when to commit (or rollback). > > -hilmar > > On Apr 8, 2009, at 11:25 AM, Chris Goddard wrote: > >> I am running into problems when trying to insert a sequence object >> retrieved from GenBank into a BioSQL schema running in a Postgres >> database. Whenever I use the 'create()' method on the sequence that >> has been made into a persistent object, the sequence isn't saved into >> the database properly. No error messages are given, and the >> corresponding Postgres primary key sequences are incremented as if >> the data had been saved properly: the appropriate tables themselves >> remain empty though. >> >> I am completely new to using the biosql-db modules, and so am >> probably missing something pretty simple. Below you will see the >> basic code that causes the problem. >> >> my $genbank_id = 'AYXXXXXX' >> >> my $genDB = new Bio::DB::GenBank; >> $sequence = $genDB->get_Seq_by_id($genbank_id); >> >> my $db = Bio::DB::BioDB->new(-database => 'biosql', >> -user => 'username', >> -dbname => 'dbname', >> -host => 'localhost', >> -driver => 'Pg'); >> >> my $pobj = $db->create_persistent($sequence); >> $pobj->create(); >> >> I am running the latest svn trunk versions of bioperl and bioperl-db >> (as of yesterday) and Postgres 8.3.7. I also downloaded the NCBI >> taxonomy info using the script included in the BioSQL package, and >> that data seemed to install without error. Any help or advice would >> be greatly appreciated. >> >> Thanks, >> Chris Goddard >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sanjay.harke at gmail.com Thu Apr 9 03:24:45 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Thu, 9 Apr 2009 08:54:45 +0530 Subject: [Bioperl-l] Help in basics of Bioperl Message-ID: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Dear friend, I need help in following problem.I am beginer in bioperl i have sequence data. i install perl-bioperl on my computer. Now i want analyse sequences with blast, tree and multiple sequence analysis. so kindly guide me from basic. sanjay From abhishek.vit at gmail.com Thu Apr 9 03:31:26 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 8 Apr 2009 23:31:26 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: Dear Sanjay As much as people on this love to help out. I would definitely put in some efforts to atleast go through the basic bio perl tutorial before asking this question. Atleast that would have helped you frame the question legitimately. I think we should put diligent effort before trying to take other people's help. Here is the link to bio perl tutorial please try to go through the relevant sections. I am sure you will get your answer there. http://www.bioperl.org/Core/Latest/bptutorial.html Thanks, -Abhi On Wed, Apr 8, 2009 at 11:24 PM, Sanjay Harke wrote: > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Thu Apr 9 03:35:12 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:35:12 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > [...] > and finally EU608407 and EU608559 made a crash: > > [...] > --------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, > attempting to recover > --------------------------------------------------- > #######...14 times ...############ I would assume that you figured out that this was triggered by or affected EU608407? Would you mind sharing how? > --------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T., > Whitcomb,LOCUS EU608407 > 1212 bp DNA linear VRL 20-APR-2008","","","CRC- > D35248959C54B9F2","1","1212","") FKs () > ERROR: null value in column "location" violates not-null constraint Is this really the verbatim copy of the error message you saw on the screen? What's really puzzling about this is how the genbank SeqIO parser could mess up parsing the reference entry to badly. Here's the reference from the version online at NCBI: REFERENCE 1 (bases 1 to 1212) AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and Petropoulos,C.J. TITLE Evidence for positive epistasis in HIV-1 JOURNAL Science 306 (5701), 1547-1550 (2004) PUBMED 15567861 How the first author line would be chopped off at the end and the LOCUS line would have gotten inserted there is a mystery to me. The location is "Science 306 (5701), 1547-1550 (2004)", and according to the error message the parser failed to extract that and the TITLE. Could you confirm that the file you are parsing is not corrupted in any way, specifically for this record? > --------------------------------------------------- > Could not store EU608559: > ------------- EXCEPTION: Bio::Root::Exception ------------- > [...] > > If I check in the biosql database if some part of this records are > inserted: So are there other sequences associated with that PubMed ID? Can you do a grep on the PubMed ID and see whether it occurs already before the one that trips up the load? > [...] > select * from dbxref where dbxref_id=4179; > dbxref_id | dbname | accession | version > -----------+--------+-----------+--------- > 4179 | PUBMED | 15567861 | 0 > > select * from bioentry where accession=15567861; Note that 15567861 is the accession (PubMed ID) for the referenced article, not the sequence. Which bioentries are associated with a reference would be in the bioentry_reference table. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Apr 9 03:51:52 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 8 Apr 2009 23:51:52 -0400 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> Message-ID: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > > On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: > >> [...] >> and finally EU608407 and EU608559 made a crash: >> >> [...] >> --------------------- WARNING --------------------- >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to recover >> --------------------------------------------------- >> #######...14 times ...############ > > I would assume that you figured out that this was triggered by or > affected EU608407? Would you mind sharing how? Looking at EU608407, it most likely wasn't the culprit or stumbling stone. It must have been triggered before that. > [...] > So are there other sequences associated with that PubMed ID? To answer my own question, it's indeed EU608407 that's from the same PubMed ID, and so am I correct in assuming that you didn't get the exception for that record, which would mean that the reference was properly inserted when that sequence was loaded. The second occurrence of the same PubMed ID should have actually triggered a successful lookup of the previously inserted record, which would then have skipped the insert. The fact that that didn't happen suggests that the PubMed ID also wasn't properly extracted from the Genbank record. So my first suspicion remains that your file is corrupted. Otherwise, if you download this record: http://www.ncbi.nlm.nih.gov/nuccore/183191257 in GenBank format and try to load it alone, it should yield the same error. Can you indeed reproduce the problem in that way? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Thu Apr 9 03:55:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Apr 2009 23:55:12 -0400 Subject: [Bioperl-l] Help in basics of Bioperl In-Reply-To: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> References: <31bb4380904082024v2b9f1004xb46eb91cce996582@mail.gmail.com> Message-ID: <4FAA64AA47534B98874AB16622D184BA@NewLife> Hi Sanjay, Judging from your posts to the list this month, I see you have an appreciation of the power of Bioperl to help you get all kinds of analysis jobs done, and that you have a real desire to learn a lot about it. I want to encourage that attitude. I also want to remind you that the absolutely best way to really understand anything is to dive into your project and try to understand the basics *on your own*. Your posts to this are honestly much too general for this list. People here are really generous with their time, but they don't have enough of it to walk you through every step. When I have an issue with my Bioperl programming (and believe me, I have had and do have many), I do at least three things before I consider posting on this list: * I read the documentation for the module I'm working with. * I go to the wiki (www.bioperl.org) and look for HOWTOs or tutorials. There is a search facility there, and many many MANY introductory links. * I go to the source code directly, and try to figure out what it is really doing. So, it turns out I rarely post questions to the list, because I've figured out my dumb mistake, or how to do that new thing. PLUS, I've become that much closer to true Bioperl independence. Please go to the page http://www.bioperl.org/wiki/Getting_Started and *read it*. Please follow the links. You may even find that your work has already been done for you. One hint that works here on the list and elsewhere is: the more work you can show you have done by yourself, the more willing an expert is to help you over the hard parts. Conversely, the less work you do, the greater the chance that your questions will go unheard. cheers, Mark ----- Original Message ----- From: "Sanjay Harke" To: Sent: Wednesday, April 08, 2009 11:24 PM Subject: [Bioperl-l] Help in basics of Bioperl > Dear friend, > > I need help in following problem.I am beginer in bioperl > > i have sequence data. > i install perl-bioperl on my computer. > Now i want analyse sequences with blast, tree and multiple sequence > analysis. > so kindly guide me from basic. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johann.pellet at inserm.fr Thu Apr 9 09:48:43 2009 From: johann.pellet at inserm.fr (Johann PELLET) Date: Thu, 9 Apr 2009 11:48:43 +0200 Subject: [Bioperl-l] load_seqdatabase error with a specific locus from genbank In-Reply-To: <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk> <97AF7BE3-547E-4BBB-8337-B5CAD9D93F4D@gmx.net> <652BD097-3E2E-4AB4-9EDE-CF1CB0888FDB@illinois.edu> <271BCF0C-4228-4B6A-9575-156E65F75669@illinois.edu> <5DDA1587-595F-4D32-A3C2-3F40C7231ACA@gmx.net> Message-ID: <2FDD67FF-5DBA-4987-A04D-231AF8B1E93B@inserm.fr> Hie Hilmar, I am very sorry, I checked my GenBank file, and you are right It's corrupted :-( grep EU608407 genbankFile AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,LOCUS EU608407 1212 bp DNA linear VRL 20-APR-2008 ACCESSION EU608407 VERSION EU608407.1 GI:183190953 So I have downloaded EU608407 and I have loaded it alone with load_sequence.pl without problems. Same for EU608559. Thanks again Johann Le 9 avr. 09 ? 05:51, Hilmar Lapp a ?crit : > > On Apr 8, 2009, at 11:35 PM, Hilmar Lapp wrote: > >> >> On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote: >> >>> [...] >>> and finally EU608407 and EU608559 made a crash: >>> >>> [...] >>> --------------------- WARNING --------------------- >>> MSG: Unexpected error in feature table for Skipping feature, >>> attempting to recover >>> --------------------------------------------------- >>> #######...14 times ...############ >> >> I would assume that you figured out that this was triggered by or >> affected EU608407? Would you mind sharing how? > > Looking at EU608407, it most likely wasn't the culprit or stumbling > stone. It must have been triggered before that. > >> [...] >> So are there other sequences associated with that PubMed ID? > > To answer my own question, it's indeed EU608407 that's from the same > PubMed ID, and so am I correct in assuming that you didn't get the > exception for that record, which would mean that the reference was > properly inserted when that sequence was loaded. > > The second occurrence of the same PubMed ID should have actually > triggered a successful lookup of the previously inserted record, > which would then have skipped the insert. The fact that that didn't > happen suggests that the PubMed ID also wasn't properly extracted > from the Genbank record. So my first suspicion remains that your > file is corrupted. > > Otherwise, if you download this record: > http://www.ncbi.nlm.nih.gov/nuccore/183191257 > > in GenBank format and try to load it alone, it should yield the same > error. Can you indeed reproduce the problem in that way? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From montalen at moulon.inra.fr Thu Apr 9 10:49:22 2009 From: montalen at moulon.inra.fr (montalent) Date: Thu, 9 Apr 2009 12:49:22 +0200 Subject: [Bioperl-l] Bioperl add_object_condition Message-ID: <6D76CE64E5E744C7B571F3BA31670F9D@bioinfo2> Dear colleague, I try to use add_object_condition() function, to get a subset of sequences. I try this : # 1. STORE SELECTED BAC IN AN HASH TABLE : key = bac_name, value = sequence # 1.1 STORE SELECTED BAC NAME IN AN ARRAY my @selected_bac_list=(); open (SELECTION, $bac_selection_file) or die "can not open $bac_selection_file :$!\n"; while (my $line=){ my ($bac_name)=($line =~ /^(.+?);.+/); # print $bac_name."\n"; push @selected_bac_list, $bac_name; } # 1.2 READ FASTA FILE WITH BIOPERL TO STORE IN AN HASH TABLE my $bac_fasta= Bio::SeqIO->new(-file=>$maize_sequence_bac_file, '-format'=>"Fasta"); my $builder = $bac_fasta->sequence_builder(); if ($builder->add_object_condition(sub { print " check \n"; my $seq_ref=shift; if ($ref_seq->{'-length'} > 5000;){ return 0;} else {return 1;} })){ print "add_object_condition renvoie true\n";} else{ print "add_object_condition renvoie false\n";} # for each sequence in fasta file, check if it is a selected bac while(my $seq=$bac_fasta->next_seq()){ print $seq->id."\n"; # PB : IT PRINTS ALL THE SEQUENCE NOT THE SUBSET.... } I can't get the sequences subset but all the sequences. So I make a print() in the closure of add_object_condition, but nothing is printed. It seems like it does not execute the sub in add_object_condition(), but add_object_conditions return true value. I try to use add_object_condition who seems to be a powerfull method, but I do not succeed. May I ask you some advice how to use add_object_condition() ? Do I forget something ? Best regards Pierre Montalent INRA - Ferme du moulon France From jarodpardon at yahoo.com.cn Fri Apr 10 00:27:29 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Fri, 10 Apr 2009 08:27:29 +0800 (CST) Subject: [Bioperl-l] bioperl translate() function for seq obj Message-ID: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Hi, all, I want to know whether Bio::PrimarySeqI::translate() uses identical method and codon table with NCBI Blast/blastx does. Thanks. Jarod ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From csembry at ualr.edu Fri Apr 10 00:54:21 2009 From: csembry at ualr.edu (Charles Embry) Date: Thu, 09 Apr 2009 19:54:21 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 Message-ID: Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. I get this error when i run the make file. "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL Writing Makefile for Bio::Ext::Align ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. Must be of the form '#.##'. (For instance '1.23') ?at ./Makefile.PL line 4" This is the first? 11 lines of the Makefile.PL for ext package use Inline::MakeMaker; use Config; WriteInlineMakefile( ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, ???????????????????????? 'Bio::SeqIO::abi' => 0.0, ?????????????????????? }, # e.g., Module::Name => 1.1, ??????????? test??????????????? => { TESTS => 'test.pl' }, ?????????? ); What does the error mean? And what version does it refer to? Of what? (staden?) What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? Thanks you C. Stephen Embry From maj at fortinbras.us Fri Apr 10 01:16:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Apr 2009 21:16:18 -0400 Subject: [Bioperl-l] bioperl translate() function for seq obj In-Reply-To: <221543.32779.qm@web15003.mail.cnb.yahoo.com> References: <221543.32779.qm@web15003.mail.cnb.yahoo.com> Message-ID: Hi Jarod- translate() uses NCBI "Standard" table by default. Check out the POD for PrimarySeqI.pm (where translate is defined). You can specify others by setting -CODONTABLE_ID => $n as an argument to translate(). The codon tables are in Bio::Tools::CodonTable, where the following are defined: @NAMES = #id ( 'Standard', #1 'Vertebrate Mitochondrial',#2 'Yeast Mitochondrial',# 3 'Mold, Protozoan, and CoelenterateMitochondrial and Mycoplasma/Spiroplasma',#4 'Invertebrate Mitochondrial',#5 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6 '', '', 'Echinoderm Mitochondrial',#9 'Euplotid Nuclear',#10 '"Bacterial"',# 11 'Alternative Yeast Nuclear',# 12 'Ascidian Mitochondrial',# 13 'Flatworm Mitochondrial',# 14 'Blepharisma Nuclear',# 15 'Chlorophycean Mitochondrial',# 16 '', '', '', '', 'Trematode Mitochondrial',# 21 'Scenedesmus obliquus Mitochondrial', #22 'Thraustochytrium Mitochondrial' #23 ); Can others (Scott M?) chime in on blast? Mark ----- Original Message ----- From: "? ?" To: "'bioperl-l'" Sent: Thursday, April 09, 2009 8:27 PM Subject: [Bioperl-l] bioperl translate() function for seq obj > > > Hi, all, > I want to know whether Bio::PrimarySeqI::translate() uses identical method and > codon table with NCBI Blast/blastx does. Thanks. > > Jarod > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rrfreimuth2 at yahoo.com Fri Apr 10 02:10:21 2009 From: rrfreimuth2 at yahoo.com (Robert Freimuth) Date: Thu, 9 Apr 2009 19:10:21 -0700 (PDT) Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code Message-ID: <38796.60680.qm@web65611.mail.ac4.yahoo.com> The Perl Foundation is looking for mentors for several projects for Google's Summer of Code.? Two of the projects are directly applicable to bioperl. In particular they're looking for mentors for these projects: Bio::Restriction::* - Improve reading and writing of RE collection in different formats; add support for multicut/multisite enzymes.A bioperl parser module for repeats/transposons."CPAN OS Installer", integrate CPAN packages into Unix package managers like rpm and apt/dpkgCross-platform Perl Bindings for wxWebKit If you're interested please see the full announcement, posted on PerlMonks:? http://www.perlmonks.org/?node_id=755872. Thanks, Bob From j_martin at lbl.gov Fri Apr 10 03:18:28 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 9 Apr 2009 20:18:28 -0700 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: References: Message-ID: <20090410031827.GE6535@eniac.jgi-psf.org> Hello, I found that 1.5.1 a pain to install, I recommend the code from http://www.bioperl.org/wiki/Ext_package#The_latest_code anywho, the read is read.pm, the message is something from inline::c I think, there's an old bug report about it, if you can't use the newer code maybe it will help. http://bugzilla.open-bio.org/show_bug.cgi?id=2074 joel On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: > Hello I am a graduate student at UALR and I am trying to install the ext package(1.5.1) on bioperl 1.5.1. > I get this error when i run the make file. > > "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. > Must be of the form '#.##'. (For instance '1.23') > ?at ./Makefile.PL line 4" > > This is the first? 11 lines of the Makefile.PL for ext package > > use Inline::MakeMaker; > use Config; > > WriteInlineMakefile( > ??????????? 'NAME'??????? => 'Bio::SeqIO::staden::read', > ??????????? 'VERSION_FROM'??? => './read.pm', # finds $VERSION, > ??????????? 'PREREQ_PM'??????? => { 'Inline::C' => 0.0, > ???????????????????????? 'Bio::SeqIO::abi' => 0.0, > ?????????????????????? }, # e.g., Module::Name => 1.1, > ??????????? test??????????????? => { TESTS => 'test.pl' }, > ?????????? ); > > What does the error mean? > > And what version does it refer to? Of what? (staden?) > What version of Staden should this be if i am using the io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? > > > Thanks you > C. Stephen Embry > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hsa_rim at yahoo.co.in Fri Apr 10 03:43:53 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Fri, 10 Apr 2009 09:13:53 +0530 (IST) Subject: [Bioperl-l] Creating Cytoband Ideogram images Message-ID: <824645.66937.qm@web94611.mail.in2.yahoo.com> Hi, I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. Thanks in advance Shafeeq Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ From hlapp at gmx.net Fri Apr 10 04:00:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 10 Apr 2009 00:00:54 -0400 Subject: [Bioperl-l] Mentors needed for bioperl projects for Google's Summer of Code In-Reply-To: <38796.60680.qm@web65611.mail.ac4.yahoo.com> References: <38796.60680.qm@web65611.mail.ac4.yahoo.com> Message-ID: <0C80FD8F-78F6-493E-94C3-AE5D845577C5@gmx.net> Hi Robert - thanks for putting us into the loop! On Apr 9, 2009, at 10:10 PM, Robert Freimuth wrote: > The Perl Foundation is looking for mentors for several projects for > Google's Summer of Code. Two of the projects are directly applicable > to bioperl. > > In particular they're looking for mentors for these projects: > > Bio::Restriction::* - Improve reading and writing of RE collection in > different formats; add support for multicut/multisite enzymes.A > bioperl parser module for repeats/transposons. I don't want to dampen any enthusiasm and the project may indeed be worthwhile, but it's also worth noting that we haven't ever seen the student applicant here (assuming it's the same who contacted Heikki a while ago). Having said that, the fact that there hasn't been any community interaction from the student yet obviously doesn't have to mean that there can't be any in the future. But in the Google Summer of Code spirit of recruiting new contributors into FLOSS communities, it's a less than ideal start. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Apr 10 04:15:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:15:45 -0500 Subject: [Bioperl-l] Problems with installing Bioperl-ext-1.5.1 on Bioperl-1.5.1 In-Reply-To: <20090410031827.GE6535@eniac.jgi-psf.org> References: <20090410031827.GE6535@eniac.jgi-psf.org> Message-ID: <327D2C1C-A61A-473A-B85D-7A249856CC85@illinois.edu> Just to note, we're not actively supporting much of the bioperl-ext code, in favor of the BioLib initiative: http://biolib.open-bio.org/wiki/Main_Page If you do use bioperl-ext I suggest only using the latest code from svn (and that in combination with bioperl-live). chris On Apr 9, 2009, at 10:18 PM, Joel Martin wrote: > Hello, > I found that 1.5.1 a pain to install, I recommend the code from > > http://www.bioperl.org/wiki/Ext_package#The_latest_code > > anywho, the read is read.pm, the message is something from > inline::c I think, there's an old bug report about it, if > you can't use the newer code maybe it will help. > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > joel > > > On Thu, Apr 09, 2009 at 07:54:21PM -0500, Charles Embry wrote: >> Hello I am a graduate student at UALR and I am trying to install >> the ext package(1.5.1) on bioperl 1.5.1. >> I get this error when i run the make file. >> >> "[root at bioinformatics bioperl-ext-1.5.1]# perl Makefile.PL >> Writing Makefile for Bio::Ext::Align >> ERROR from evaluation of /home/stephen/capstone/bioperl-ext-1.5.1/ >> Bio/SeqIO/staden/Makefile.PL: Invalid version '' for >> Bio::SeqIO::staden::read. >> Must be of the form '#.##'. (For instance '1.23') >> at ./Makefile.PL line 4" >> >> This is the first 11 lines of the Makefile.PL for ext package >> >> use Inline::MakeMaker; >> use Config; >> >> WriteInlineMakefile( >> 'NAME' => 'Bio::SeqIO::staden::read', >> 'VERSION_FROM' => './read.pm', # finds $VERSION, >> 'PREREQ_PM' => { 'Inline::C' => 0.0, >> 'Bio::SeqIO::abi' => 0.0, >> }, # e.g., Module::Name => 1.1, >> test => { TESTS => 'test.pl' }, >> ); >> >> What does the error mean? >> >> And what version does it refer to? Of what? (staden?) >> What version of Staden should this be if i am using the >> io_lib-1.8.11 , following the INSTALL instructions with bioperl-ext? >> >> >> Thanks you >> C. Stephen Embry >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 10 04:32:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Apr 2009 23:32:59 -0500 Subject: [Bioperl-l] Pasing Affymatrix Microarray output In-Reply-To: <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> References: <992233.10677.qm@web15208.mail.cnb.yahoo.com> <264855a00904071910n486ed5f1j7b130c47c6a57dce@mail.gmail.com> <8f200b4c0904072259l22311b9cxdbad2fcdd792dfab@mail.gmail.com> Message-ID: <0340305E-EAB3-4A08-9B41-5E706F4A5A16@illinois.edu> Would definitely be worth testing out interactivity with these. chris On Apr 8, 2009, at 12:59 AM, Steve Chervitz wrote: > Check out our Affymetrix Power Tools (APT) package: > > http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx > > We distribute binaries for Linux and Mac OSX, as well as source code > so you can compile it yourself if you want. Note however that this is > written in C++, not Perl. We don't provide SWIG or XS interfaces for > direct access via Perl, though this would definitely be doable, if > anyone is interested. > > Probably the easiest approach from Perl would be to simply call the > appropriate APT executable through the shell as in: > > system("/path/to/apt --args ..."); > > The Perl code can parse the output files and take it from there. > > Steve > > > On Tue, Apr 7, 2009 at 7:10 PM, Sean Davis > wrote: >> On Tue, Apr 7, 2009 at 9:39 PM, Wen-Zhi WANG > >wrote: >> >>> Dear all, >>> >>> Recently, I focus on population genomics data outputed by affymatrix >>> microarray system. However, softwares which designed by affy. inc >>> only run >>> in Windows 386 platform. Is there any application can used in Linux? >>> Bio::Affymatrix was not strong enough to get the detailed >>> informaton. >>> >> >> You may want to look at a non-bioperl solution such as Bioconductor ( >> http://bioconductor.org). >> >> Sean >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Wed Apr 1 21:56:36 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 1 Apr 2009 23:56:36 +0200 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <49D39E60.1020103@gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: You may find the attached Perl module useful. It solves the difficult parts of getting the taxonomy given a GI identifier or a taxID. It is designed to be able to process a high number of GIs very fast and with low memory usage. An example of usage would be: use taxbuild; #Build the taxonomyDB my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); # Get the taxonomy given a GI identifier my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); # Get the taxonomy term of a GI identifier at a given level my $term_at_level = $taxDB? >get_term_at_level_from_gi("35961124","family"); # Get the taxid of a GI identifier my $taxid = $taxDB?>get_taxid("35961124"); # Get the taxonomy given a taxid my @tax = $taxDB?>get_taxonomy($taxid); # Get the taxonomy at a given level given a taxid my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); # Get the level of a given taxonomical name my $level = $taxDB?>get_level_from_name("Proteobacteria"); The "dict file" is a processed version of the gi_taxid file from taxonomyDB. You can get this file by running the tax2bin2.pl script also attached: $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin or, if you are working with genes instead of proteins: $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin A possible solution to the original post using this module would be something like: # Initialize the taxonomyDB once. my $taxDB = taxbuild?>new( nodes => $nodes_file_from_taxonomyDB, names => $names_file_from_taxonomyDB, dict => $dictFile, save_mem => 1 ); #For each blast result #Extract the GI my $superkingdom = $taxDB- >get_term_at_level_from_gi($gi,"superkingdom"); if ($superkingdom eq "Bacteria") { # Do whatever you want } elsif ($superkingdom eq "Eukaryota") # Do whatever you want } The module has been tested mainly in Linux systems, but should run without problems in Windows and Mac too. If you encounter any problem with it don't hesitate to contact me. Hope this helps, M; -------------- next part -------------- A non-text attachment was scrubbed... Name: tax2bin2.pl Type: text/x-perl-script Size: 400 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: taxbuild.pm Type: text/x-perl-script Size: 10599 bytes Desc: not available URL: -------------- next part -------------- El 01/04/2009, a las 19:03, Florent Angly escribi?: > FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that > you won't be able to put its information in a hash (unless you have > a lot of memory). > Florent > > Smithies, Russell wrote: >> The taxonomy information isn't in the blast output unless you >> created custom fasta headers for your blast database. >> The easiest way to get the tax_id for your accessions would be to >> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >> . >> If you load that file into a hash, parse the accessions out of the >> blast hits then lookup the tax_id from that hash, I think it should >> be fairly fast. >> Checking which are prokaryotes and which are eukaryotes based on >> tax_id is a separate problem :-) >> If you grab the taxdump.tar.gz file from the same site, the >> nodes.dmp file contained within lists what division each tax_id >> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >> so you can probably work it out from that. >> >> It's not a very BioPerly solution but sometimes just looking up the >> answer from a file/table/hash is the simplest way. >> Hope this helps, >> >> Russell Smithies >> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 >> 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >> >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] taxonomy ID >>> >>> Hi All, >>> I am writing a script, for one of its part i have to >>> parse a blast >>> report (refseq blast) and check how may organisms are eukaryotes >>> and how >>> namy of them are prokaryotes. >>> I am using BIO::DB::taxinomy module: >>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>> >>> But for this i need a taxonomyid (like '33090') given in the >>> example. >>> So is it possible to get a taxonomyid from refseq balst report? >>> If not then how i can deal with this problem? >>> >>> i would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Fri Apr 10 04:34:03 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Apr 2009 23:34:03 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: Estelle, Always direct your questions to the bioperl mail list (I'm cc'ing them now). I'm not sure about using that option, maybe someone else can answer? chris On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > Dear Mr Fields, > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > to run > blastpgp. > However, the -C option (save a checkpoint in ASN.x) seems not > available in > this module (options are -j, -h, -c, -B, and -Q). Is there another > way to > save the checkpoint? > > I thank you by advance (and apologize for my English). > > Estelle From jaleto at gmail.com Fri Apr 10 07:50:46 2009 From: jaleto at gmail.com (Jonathan Leto) Date: Fri, 10 Apr 2009 00:50:46 -0700 Subject: [Bioperl-l] Google Summer of Code 2009 BioPerl Student Applications Message-ID: <9aaadf9c0904100050g7f82f925s2e9bae9646da6cd5@mail.gmail.com> Howdy, There are two student applications for The Perl Foundation this year which are BioPerl-related, and I would very much like for them to succeed, but most of the current mentors do not have the background to judge whether they are possible in the time given, or what most of words mean for that matter. We really need some feedback from BioPerl people as to the viability of this applications, as well as comments and suggestions for implementation issues. Please sign up at the GSoC web app [1], then apply to be a mentor for The Perl Foundation. It requires me to manually accept you and then you will be able to view the 19 applications we received this year. Please also join the private mentor list [2] and the students+mentors list [3] if you would like to keep up to date and get involved. Welcome! Cheers, [1] http://socghop.appspot.com/ [2] http://groups.google.com/group/tpf-gsoc [3] http://groups.google.com/group/tpf-gsoc-students -- [---------------------] Jonathan Leto jaleto at gmail.com From scott at scottcain.net Fri Apr 10 13:08:53 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 10 Apr 2009 09:08:53 -0400 Subject: [Bioperl-l] Creating Cytoband Ideogram images In-Reply-To: <824645.66937.qm@web94611.mail.in2.yahoo.com> References: <824645.66937.qm@web94611.mail.in2.yahoo.com> Message-ID: <536f21b00904100608w23484c5bi3765da39b6b4d946@mail.gmail.com> Hello Shafeeq, You need Bio::Graphics::Glyph::ideogram, which is part of Bio::Graphics. You can install it from cpan and it will install BioPerl 1.6 as a prereq. The perldoc for ideogram.pm has example code and data, since the format of the data is important. Scott On Thu, Apr 9, 2009 at 11:43 PM, shafeeq rim wrote: > Hi, > > I want to create CytoBand ideogram images from CytoBand data in NCBI data. Is there any module in BioPerl or any other way to do it ? I want to create chromosome cytoband ideograms for each chromosome. > > Thanks in advance > Shafeeq > > > > ? ? ?Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Fri Apr 10 13:32:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 10 Apr 2009 08:32:00 -0500 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: I don't know if this has been pointed out, but Bio::DB::Taxonomy is also capable of indexing and using the NCBI tax flat files. use Bio::DB::Taxonomy; my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namefile); # use other Bio::DB::Taxonomy methods chris On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > You may find the attached Perl module useful. It solves the > difficult parts of getting the taxonomy given a GI identifier or a > taxID. It is designed to be able to process a high number of GIs > very fast and with low memory usage. > > An example of usage would be: > > use taxbuild; > #Build the taxonomyDB > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > # Get the taxonomy given a GI identifier > my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); > > # Get the taxonomy term of a GI identifier at a given level > my $term_at_level = $taxDB? > >get_term_at_level_from_gi("35961124","family"); > > # Get the taxid of a GI identifier > my $taxid = $taxDB?>get_taxid("35961124"); > > # Get the taxonomy given a taxid > my @tax = $taxDB?>get_taxonomy($taxid); > > # Get the taxonomy at a given level given a taxid > my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); > > # Get the level of a given taxonomical name > my $level = $taxDB?>get_level_from_name("Proteobacteria"); > > The "dict file" is a processed version of the gi_taxid file from > taxonomyDB. You can get this file by running the tax2bin2.pl script > also attached: > > $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin > or, if you are working with genes instead of proteins: > $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin > > A possible solution to the original post using this module would be > something like: > > # Initialize the taxonomyDB once. > my $taxDB = taxbuild?>new( > nodes => > $nodes_file_from_taxonomyDB, > names => > $names_file_from_taxonomyDB, > dict => $dictFile, > save_mem => 1 > ); > > #For each blast result > #Extract the GI > my $superkingdom = $taxDB- > >get_term_at_level_from_gi($gi,"superkingdom"); > if ($superkingdom eq "Bacteria") { > # Do whatever you want > } elsif ($superkingdom eq "Eukaryota") > # Do whatever you want > } > > > The module has been tested mainly in Linux systems, but should run > without problems in Windows and Mac too. If you encounter any > problem with it don't hesitate to contact me. > > Hope this helps, > > M; > > > > > > El 01/04/2009, a las 19:03, Florent Angly escribi?: > >> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >> you won't be able to put its information in a hash (unless you have >> a lot of memory). >> Florent >> >> Smithies, Russell wrote: >>> The taxonomy information isn't in the blast output unless you >>> created custom fasta headers for your blast database. >>> The easiest way to get the tax_id for your accessions would be to >>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>> . >>> If you load that file into a hash, parse the accessions out of the >>> blast hits then lookup the tax_id from that hash, I think it >>> should be fairly fast. >>> Checking which are prokaryotes and which are eukaryotes based on >>> tax_id is a separate problem :-) >>> If you grab the taxdump.tar.gz file from the same site, the >>> nodes.dmp file contained within lists what division each tax_id >>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) >>> so you can probably work it out from that. >>> >>> It's not a very BioPerly solution but sometimes just looking up >>> the answer from a file/table/hash is the simplest way. >>> Hope this helps, >>> >>> Russell Smithies >>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>> To: bioperl-l >>>> Subject: [Bioperl-l] taxonomy ID >>>> >>>> Hi All, >>>> I am writing a script, for one of its part i have to >>>> parse a blast >>>> report (refseq blast) and check how may organisms are eukaryotes >>>> and how >>>> namy of them are prokaryotes. >>>> I am using BIO::DB::taxinomy module: >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> But for this i need a taxonomyid (like '33090') given in the >>>> example. >>>> So is it possible to get a taxonomyid from refseq balst report? >>>> If not then how i can deal with this problem? >>>> >>>> i would really appreciate if anyone can help me out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Fri Apr 10 13:42:15 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 10 Apr 2009 09:42:15 -0400 Subject: [Bioperl-l] Query about Bioperl and Mysql In-Reply-To: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> References: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> Message-ID: <264855a00904100642l482deebend6be66b140933c2c@mail.gmail.com> On Sat, Mar 28, 2009 at 8:41 AM, Sanjay Harke wrote: > Dear friends, > > anybody nows about my following problem. > > !) I want to use my own database mysql with Bioperl > > kindly guide for it. > You'll want to look at the perl DBI and DBD::mysql modules. Sean From bosborne11 at verizon.net Fri Apr 10 13:55:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 09:55:00 -0400 Subject: [Bioperl-l] Access Uniprot detailed information In-Reply-To: <22951210.post@talk.nabble.com> References: <22951210.post@talk.nabble.com> Message-ID: <4C3C5234-31F7-4EEF-BBA0-9B912D21F210@verizon.net> Markus, There is some discussion of the structure of "swiss" format files in the Feature-Annotation HOW TO. Have you taken a look at this? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Other_Sequence_File_Formats This section does not explain all the fields in each different format, but it shows you code that you can run that will print out all the annotations and features. You're really asking 2 questions, I think. Have you figured out how to retrieve a sequence? See if this helps you: http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Brian O. On Apr 8, 2009, at 10:07 AM, manni122 wrote: > > Hi there, > maybe I am not able to read careful enough through the Howto section. > But is there a function in BioPerl that retrieves for a given > Uniprot Access > Code or ID from the Uniprot Database some general annotations like > enzymatic > activity or literature references? > I appreciate any help! > -- > View this message in context: http://www.nabble.com/Access-Uniprot-detailed-information-tp22951210p22951210.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Apr 10 14:05:06 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Apr 2009 10:05:06 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: <22816585.post@talk.nabble.com> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <22816585.post@talk.nabble.com> Message-ID: Dereje, There's a HOW TO that discusses an approach similar to this (Using local Genbank and Entrez Gene files): http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences But the provided script uses Gene ids, not chromosome names. The more general suggestion would be to look at the module Bio::DB::Fasta. Brian O. On Mar 31, 2009, at 6:59 PM, demis001 wrote: > > Hi , > > I am new to BioPerl and this forum and even do not know how to post > the new > post. I have one question for you guys. > > Is there any BioPerl module that allows me to download sequence > based on > chromosome name, seqStart and SeqEnd given the formatted human genome > database downloaded on my Linux desktop? > > I used to do this using Perl $URI object and it is really slow as the > process depend on the network. To be more specific, I took chrName, > seqStart > and seqEnd and go to Ensembl database to get the sequence one by one > using > Perl $URI object. > > I thought it might be easier if I process locally using indexed > database > using BioPerl module if there is any designed for this purpose. > > Input, millions rows of tab delimited (CSV) file contain > information about > chrName, seqStart, seqEnd. Locally formatted/indexed human genome. > Output > should be the fasta sequence contain the sequence and with the header > contain chr name and location persed > > Sorry if I posted in the wrong section of the forum and happy to > get any > recommendation. > Thanks > > Govind Chandra wrote: >> >> Hi, >> >> The code below >> >> >> ====== code begins ======= >> #use strict; >> use Bio::SeqIO; >> >> $infile='NC_000913.gbk'; >> my $seqio=Bio::SeqIO->new(-file => $infile); >> my $seqobj=$seqio->next_seq(); >> my @features=$seqobj->all_SeqFeatures(); >> my $count=0; >> foreach my $feature (@features) { >> unless($feature->primary_tag() eq 'CDS') {next;} >> print($feature->start()," ", $feature->end(), " >> ",$feature->strand(),"\n"); >> $ac=$feature->annotation(); >> $temp1=$ac->get_Annotations("locus_tag"); >> @temp2=$ac->get_Annotations(); >> print("$temp1 $temp2[0] @temp2\n"); >> if($count++ > 5) {last;} >> } >> >> print(ref($ac),"\n"); >> exit; >> >> ======= code ends ======== >> >> produces the output >> >> ========== output begins ======== >> >> 190 255 1 >> 0 >> 337 2799 1 >> 0 >> 2801 3733 1 >> 0 >> 3734 5020 1 >> 0 >> 5234 5530 1 >> 0 >> 5683 6459 -1 >> 0 >> 6529 7959 -1 >> 0 >> Bio::Annotation::Collection >> >> =========== output ends ========== >> >> $ac is-a Bio::Annotation::Collection but does not actually contain >> any >> annotation from the feature. Is this how it should be? I cannot >> figure >> out what is wrong with the script. Earlier I used to use has_tag(), >> get_tag_values() etc. but the documentation says these are >> deprecated. >> >> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >> uname >> -a is >> >> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> x86_64 x86_64 x86_64 GNU/Linux >> >> Thanks in advance for any help. >> >> Govind >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Apr 10 15:51:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 10 Apr 2009 08:51:45 -0700 Subject: [Bioperl-l] taxonomy ID In-Reply-To: References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> <49D39E60.1020103@gmail.com> Message-ID: <6B951DED-0632-451C-86A4-2A215B1CAE6C@bioperl.org> The only difference to the DB::Taxonomy module I can see is we don't specifically have the dictionary part -- for gi -> taxid, but I just do a local DBHash index of that when I need it. -jason On Apr 10, 2009, at 6:32 AM, Chris Fields wrote: > I don't know if this has been pointed out, but Bio::DB::Taxonomy is > also capable of indexing and using the NCBI tax flat files. > > use Bio::DB::Taxonomy; > > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namefile); > > # use other Bio::DB::Taxonomy methods > > chris > > On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote: > >> You may find the attached Perl module useful. It solves the >> difficult parts of getting the taxonomy given a GI identifier or a >> taxID. It is designed to be able to process a high number of GIs >> very fast and with low memory usage. >> >> An example of usage would be: >> >> use taxbuild; >> #Build the taxonomyDB >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> # Get the taxonomy given a GI identifier >> my @tax = $taxDB?>get_taxonomy_from_gi("35961124"); >> >> # Get the taxonomy term of a GI identifier at a given level >> my $term_at_level = $taxDB? >> >get_term_at_level_from_gi("35961124","family"); >> >> # Get the taxid of a GI identifier >> my $taxid = $taxDB?>get_taxid("35961124"); >> >> # Get the taxonomy given a taxid >> my @tax = $taxDB?>get_taxonomy($taxid); >> >> # Get the taxonomy at a given level given a taxid >> my $taxid_at_level = $taxDB?>get_term_at_level($taxid,"genus"); >> >> # Get the level of a given taxonomical name >> my $level = $taxDB?>get_level_from_name("Proteobacteria"); >> >> The "dict file" is a processed version of the gi_taxid file from >> taxonomyDB. You can get this file by running the tax2bin2.pl script >> also attached: >> >> $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin >> or, if you are working with genes instead of proteins: >> $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin >> >> A possible solution to the original post using this module would be >> something like: >> >> # Initialize the taxonomyDB once. >> my $taxDB = taxbuild?>new( >> nodes => >> $nodes_file_from_taxonomyDB, >> names => >> $names_file_from_taxonomyDB, >> dict => $dictFile, >> save_mem => 1 >> ); >> >> #For each blast result >> #Extract the GI >> my $superkingdom = $taxDB- >> >get_term_at_level_from_gi($gi,"superkingdom"); >> if ($superkingdom eq "Bacteria") { >> # Do whatever you want >> } elsif ($superkingdom eq "Eukaryota") >> # Do whatever you want >> } >> >> >> The module has been tested mainly in Linux systems, but should run >> without problems in Windows and Mac too. If you encounter any >> problem with it don't hesitate to contact me. >> >> Hope this helps, >> >> M; >> >> >> >> >> >> El 01/04/2009, a las 19:03, Florent Angly escribi?: >> >>> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that >>> you won't be able to put its information in a hash (unless you >>> have a lot of memory). >>> Florent >>> >>> Smithies, Russell wrote: >>>> The taxonomy information isn't in the blast output unless you >>>> created custom fasta headers for your blast database. >>>> The easiest way to get the tax_id for your accessions would be to >>>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz >>>> . >>>> If you load that file into a hash, parse the accessions out of >>>> the blast hits then lookup the tax_id from that hash, I think it >>>> should be fairly fast. >>>> Checking which are prokaryotes and which are eukaryotes based on >>>> tax_id is a separate problem :-) >>>> If you grab the taxdump.tar.gz file from the same site, the >>>> nodes.dmp file contained within lists what division each tax_id >>>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, >>>> etc) so you can probably work it out from that. >>>> >>>> It's not a very BioPerly solution but sometimes just looking up >>>> the answer from a file/table/hash is the simplest way. >>>> Hope this helps, >>>> >>>> Russell Smithies >>>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz >>>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T >>>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>>>> Sent: Wednesday, 1 April 2009 7:43 a.m. >>>>> To: bioperl-l >>>>> Subject: [Bioperl-l] taxonomy ID >>>>> >>>>> Hi All, >>>>> I am writing a script, for one of its part i have to >>>>> parse a blast >>>>> report (refseq blast) and check how may organisms are eukaryotes >>>>> and how >>>>> namy of them are prokaryotes. >>>>> I am using BIO::DB::taxinomy module: >>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>>> >>>>> But for this i need a taxonomyid (like '33090') given in the >>>>> example. >>>>> So is it possible to get a taxonomyid from refseq balst report? >>>>> If not then how i can deal with this problem? >>>>> >>>>> i would really appreciate if anyone can help me out. >>>>> >>>>> Thanks >>>>> Shalabh >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> = >>>> = >>>> = >>>> = >>>> =================================================================== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From SMarkel at accelrys.com Fri Apr 10 16:01:25 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 10 Apr 2009 12:01:25 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp In-Reply-To: References: <7c35ac200904070308y514ee46bkce6a46633c0bbd13@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729E04A77@exch1-hi.accelrys.net> Estelle, Are you using the most recent version of Bio::Tools::Run::StandAloneNCBIBlast? The available blastpgp parameters are our @BLASTPGP_PARAMS = qw(A B C E F G H I J K L M N O P Q R S T U W X Y Z a b c e f h j k l m q s t u v y z); See line 94. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, 09 April 2009 9:34 PM > To: Estelle Proux > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneNCBIBlast - blastpgp > > Estelle, > > Always direct your questions to the bioperl mail list (I'm cc'ing them > now). I'm not sure about using that option, maybe someone else can > answer? > > chris > > On Apr 7, 2009, at 5:08 AM, Estelle Proux wrote: > > > Dear Mr Fields, > > > > I would like to use the module Bio::Tools::Run::StandAloneNCBIBlast > > to run > > blastpgp. > > However, the -C option (save a checkpoint in ASN.x) seems not > > available in > > this module (options are -j, -h, -c, -B, and -Q). Is there another > > way to > > save the checkpoint? > > > > I thank you by advance (and apologize for my English). > > > > Estelle > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sat Apr 11 13:50:20 2009 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sat, 11 Apr 2009 21:50:20 +0800 (CST) Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Hi, all, I use Bio::SeqIO driver to parse data files. The input data is somewhat buggy, and some of entries are not strict in format. The parser will throw exceptions and halt when meeting these bad entries. However, I want to just skip these entries, not stop there. So how to suppress exceptions? Thanks. Jarod ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From maj at fortinbras.us Sat Apr 11 15:32:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 11 Apr 2009 11:32:16 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions Message-ID: missed the list. ----- Original Message ----- From: "Mark A. Jensen" To: "? ?" Sent: Saturday, April 11, 2009 10:52 AM Subject: Re: [Bioperl-l] how to suppress Bioperl exceptions > Hey Jarod- > You can try setting the verbosity of the object negative, as > > $seqio->verbose(-1); > > I've found, though, that the warning messages still come through > sometimes. I've gotten control of these using the Error package: > > use Error qw(:try); > > try { > $seqio = Bio::SeqIO->new(-file='my.fas'); > } > catch Error with { > my $e = shift; > # $e->test will contain the message > }; > > Note the lack of ; after the try block, and the > presence thereof after the catch block. > > cheers -Mark > ----- Original Message ----- > From: "? ?" > To: > Sent: Saturday, April 11, 2009 9:50 AM > Subject: [Bioperl-l] how to suppress Bioperl exceptions > > >> >> Hi, all, >> I use Bio::SeqIO driver to parse data files. The input data is somewhat >> buggy, and some of entries are not strict in format. The parser will throw >> exceptions and halt when meeting these bad entries. However, I want to just >> skip these entries, not stop there. So how to suppress exceptions? >> Thanks. >> >> Jarod >> >> >> >> ___________________________________________________________ >> ????????????????? >> http://card.mail.cn.yahoo.com/ >> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Sat Apr 11 15:56:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Apr 2009 11:56:35 -0400 Subject: [Bioperl-l] how to suppress Bioperl exceptions In-Reply-To: <936515.8386.qm@web15007.mail.cnb.yahoo.com> References: <936515.8386.qm@web15007.mail.cnb.yahoo.com> Message-ID: Hi Jarod, in addition to Mark's response, what you say in your message would mean that corruption is in specific entries of a file and you want to skip those, rather than entire files. If this is true, then you'd have to put the $seq=$seqio->next_seq() call into the try {} block as that'd be the one that would raise the exception. The SeqIO parsers don't generally guarantee though that they will gracefully recover from a parsing error and advance to the next record; I think the genbank parser will do that, but you will definitely want to check that. -hilmar On Apr 11, 2009, at 9:50 AM, ? ? wrote: > > Hi, all, > I use Bio::SeqIO driver to parse data files. The input data is > somewhat buggy, and some of entries are not strict in format. The > parser will throw exceptions and halt when meeting these bad > entries. However, I want to just skip these entries, not stop there. > So how to suppress exceptions? > Thanks. > > Jarod > > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From oleksii.nikolaienko at gmail.com Sun Apr 12 11:10:47 2009 From: oleksii.nikolaienko at gmail.com (Oleksii Nikolaienko) Date: Sun, 12 Apr 2009 14:10:47 +0300 Subject: [Bioperl-l] GSoC proposal Message-ID: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Hi all! My name is Oleksii, I`m PhD student and I`d like to receive your comments on my proposal for Google summer of code. It`s called "bioperl-live::Bio::Restriction::* - implementing missing features" and I`m going to: 1) add support for reading and writing different file formats for module Bio::Restriction::IO 2) add support for multicut/multisite enzymes 3) add information on recommended buffers, restriction efficiency, sensitivity to methylation, etc and corresponding new methods 4) update documentation for Bio::Restriction::* modules Thanks in advance for your suggestions. notch From roy.chaudhuri at gmail.com Tue Apr 14 14:54:21 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 14 Apr 2009 15:54:21 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error Message-ID: <49E4A39D.2020909@gmail.com> Hi Mike. I did get that problem solved in the end, thanks to lots of help from Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped cc-ing the mailing list at some point. The last archived message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had the correct solution - the code change was incorporated into the bioperl-ext CVS, and is in the latest version that you can get from SVN (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the problem you must be experiencing a different issue. You should also bear in mind the message Chris Fields sent to the list a few days ago, and have a look at using BioLib instead: > Just to note, we're not actively supporting much of the bioperl-ext > code, in favor of the BioLib initiative: > > http://biolib.open-bio.org/wiki/Main_Page > > If you do use bioperl-ext I suggest only using the latest code from > svn (and that in combination with bioperl-live). > > chris Hope this helps. Roy. Michael Stubbington wrote: > Dear Dr. Chaudhuri, > > I am currently trying to write a bioperl script that parses .abi > sequence files. I am having exactly the same problem as you did when > you posted this enquiry to the bioperl mailing list > http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was > wondering if you ever solved the problem and, if so, can you remember > what you did? I?d be very grateful for any help you can provide. I > can?t find this problem mentioned anywhere else online. > > Thank you for your time. > > > > Mike -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From cjfields at illinois.edu Tue Apr 14 15:20:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 10:20:00 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <49E4A39D.2020909@gmail.com> References: <49E4A39D.2020909@gmail.com> Message-ID: For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I?d be very grateful for any help you can provide. I >> can?t find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 14 18:21:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Apr 2009 13:21:43 -0500 Subject: [Bioperl-l] GSoC proposal In-Reply-To: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> References: <4d4764d50904120410s6d49481dv3afc9f54ff4db1ca@mail.gmail.com> Message-ID: On Apr 12, 2009, at 6:10 AM, Oleksii Nikolaienko wrote: > Hi all! > My name is Oleksii, I`m PhD student and I`d like to receive your > comments on > my proposal for Google summer of code. It`s called > "bioperl-live::Bio::Restriction::* - implementing missing features" > and I`m > going to: > > 1) add support for reading and writing different file formats for > module Bio::Restriction::IO You should specify which formats you intend on working with. It's known that several formats don't carry all data, for instance prototype information, vendors, etc. so that should be documented for end-users. You should probably suggest a workaround for getting at missing data (i.e. a format that carries all info, retrieving prototype data separately, etc). > 2) add support for multicut/multisite enzymes Agreed, though you should be more specific on how you intend to implement this. From the Bio::Restriction::Enzyme documentation the sequence site is supposed to be a Bio::PrimarySeq (though I would probably change that internally so it creates these on the fly from the stored string). Multicut/multisite implies list context return, so it may become an API issue (and using wantarray as a workaround is fraught with problematic API traps that I suggest avoiding if at all possible). > 3) add information on recommended buffers, restriction > efficiency, > sensitivity to methylation, etc and corresponding new methods Much of this should probably be outlined in the corresponding interface class prior to implementation. > 4) update documentation for Bio::Restriction::* modules Yes, completely agree. This should be bumped closer to the top of the priority list (and outlined in the interface classes). > Thanks in advance for your suggestions. > > notch > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l chris From j_martin at lbl.gov Wed Apr 15 06:50:37 2009 From: j_martin at lbl.gov (Joel Martin) Date: Tue, 14 Apr 2009 23:50:37 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <20090415065037.GB1175@eniac.jgi-psf.org> Hello, Do you know where it says io_lib will stop supporting ABI? We use the latest ( 1.11.6 ) for this so I know it does read them and I just checked with one fresh off a sequencer. But I couldn't find an active forum for staden. Thanks, Joel On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: > For ABI files you'll need an older version of io_lib that supports ABI or > the io_lib that comes with the full staden package. Recent versions of > io_lib don't have ABI support built-in anymore. > > chris > > On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > >> Hi Mike. >> >> I did get that problem solved in the end, thanks to lots of help from >> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >> cc-ing the mailing list at some point. The last archived message in the >> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >> the correct solution - the code change was incorporated into the >> bioperl-ext CVS, and is in the latest version that you can get from SVN >> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >> problem you must be experiencing a different issue. >> >> You should also bear in mind the message Chris Fields sent to the list a >> few days ago, and have a look at using BioLib instead: >> >>> Just to note, we're not actively supporting much of the bioperl-ext >>> code, in favor of the BioLib initiative: >>> http://biolib.open-bio.org/wiki/Main_Page >>> If you do use bioperl-ext I suggest only using the latest code from svn >>> (and that in combination with bioperl-live). >> > >>> chris >> >> Hope this helps. >> Roy. >> >> >> >> Michael Stubbington wrote: >>> Dear Dr. Chaudhuri, >>> I am currently trying to write a bioperl script that parses .abi sequence >>> files. I am having exactly the same problem as you did when >>> you posted this enquiry to the bioperl mailing list >>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>> wondering if you ever solved the problem and, if so, can you remember >>> what you did? I?d be very grateful for any help you can provide. I >>> can?t find this problem mentioned anywhere else online. >>> Thank you for your time. >>> Mike >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Apr 15 12:26:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Apr 2009 07:26:15 -0500 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <20090415065037.GB1175@eniac.jgi-psf.org> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> Message-ID: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Joel, They haven't stopped supporting it. IIRC the separate io_lib distribution no longer has the ABI headers, but the io_lib with the full staden package does (a little confusing, yes). I have 1.11.6 and ABI-related tests for bioperl and bioperl-ext don't pass, but compiling with an earlier version does work. It may be as simple as including the header files from an old version, but I haven't tried that. chris On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > Hello, > Do you know where it says io_lib will stop supporting ABI? We use > the latest ( 1.11.6 ) for this so I know it does read them and I just > checked with one fresh off a sequencer. But I couldn't find an active > forum for staden. > > Thanks, > Joel > > On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >> For ABI files you'll need an older version of io_lib that supports >> ABI or >> the io_lib that comes with the full staden package. Recent >> versions of >> io_lib don't have ABI support built-in anymore. >> >> chris >> >> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >> >>> Hi Mike. >>> >>> I did get that problem solved in the end, thanks to lots of help >>> from >>> Aaron Mackey. Looking at the bioperl-l archives it seems like we >>> stopped >>> cc-ing the mailing list at some point. The last archived message >>> in the >>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/ >>> 018925.html) had >>> the correct solution - the code change was incorporated into the >>> bioperl-ext CVS, and is in the latest version that you can get >>> from SVN >>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't >>> solve the >>> problem you must be experiencing a different issue. >>> >>> You should also bear in mind the message Chris Fields sent to the >>> list a >>> few days ago, and have a look at using BioLib instead: >>> >>>> Just to note, we're not actively supporting much of the bioperl-ext >>>> code, in favor of the BioLib initiative: >>>> http://biolib.open-bio.org/wiki/Main_Page >>>> If you do use bioperl-ext I suggest only using the latest code >>>> from svn >>>> (and that in combination with bioperl-live). >>>> >>>> chris >>> >>> Hope this helps. >>> Roy. >>> >>> >>> >>> Michael Stubbington wrote: >>>> Dear Dr. Chaudhuri, >>>> I am currently trying to write a bioperl script that parses .abi >>>> sequence >>>> files. I am having exactly the same problem as you did when >>>> you posted this enquiry to the bioperl mailing list >>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>> wondering if you ever solved the problem and, if so, can you >>>> remember >>>> what you did? I?d be very grateful for any help you can provide. I >>>> can?t find this problem mentioned anywhere else online. >>>> Thank you for your time. >>>> Mike >>> >>> -- >>> Dr. Roy Chaudhuri >>> Department of Veterinary Medicine >>> University of Cambridge, U.K. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Michael.Stubbington at hpa.org.uk Wed Apr 15 07:43:39 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Wed, 15 Apr 2009 08:43:39 +0100 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: References: <49E4A39D.2020909@gmail.com> Message-ID: <335635A922FA2B43B35B6ADD7929CC590171550C@porhpaexc001.HPA.org.uk> Thanks a lot for your help. I finally solved the problem with a combination of: 1) Checking out the latest bioperl-ext from svn. 2) A fresh install of an earlier version of io_lib (8.12) 3) Changing to "config.h" in os.h Everything seems to be working now. Best wishes, Mike -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: 14 April 2009 16:20 To: Roy Chaudhuri Cc: Michael Stubbington; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Bio::SeqIO::staden::read make test error For ABI files you'll need an older version of io_lib that supports ABI or the io_lib that comes with the full staden package. Recent versions of io_lib don't have ABI support built-in anymore. chris On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: > Hi Mike. > > I did get that problem solved in the end, thanks to lots of help > from Aaron Mackey. Looking at the bioperl-l archives it seems like > we stopped cc-ing the mailing list at some point. The last archived > message in the thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html > ) had the correct solution - the code change was incorporated into > the bioperl-ext CVS, and is in the latest version that you can get > from SVN (see http://www.bioperl.org/wiki/Ext_package). If that > doesn't solve the problem you must be experiencing a different issue. > > You should also bear in mind the message Chris Fields sent to the > list a few days ago, and have a look at using BioLib instead: > >> Just to note, we're not actively supporting much of the bioperl- >> ext code, in favor of the BioLib initiative: >> http://biolib.open-bio.org/wiki/Main_Page >> If you do use bioperl-ext I suggest only using the latest code >> from svn (and that in combination with bioperl-live). > > >> chris > > Hope this helps. > Roy. > > > > Michael Stubbington wrote: >> Dear Dr. Chaudhuri, >> I am currently trying to write a bioperl script that parses .abi >> sequence files. I am having exactly the same problem as you did when >> you posted this enquiry to the bioperl mailing list http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html >> . I was wondering if you ever solved the problem and, if so, can >> you remember >> what you did? I'd be very grateful for any help you can provide. I >> can't find this problem mentioned anywhere else online. >> Thank you for your time. >> Mike > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From cjfields1 at gmail.com Mon Apr 20 16:12:10 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Mon, 20 Apr 2009 11:12:10 -0500 Subject: [Bioperl-l] BioPerl 1.6.1 slate Message-ID: <58CCB0F1-9BC8-4437-8870-3D6CAA7BB1ED@gmail.com> All, Just to note, the bioperl 1.6.1 release will probably be delayed until mid-May (just been too busy to work on it, end-of-semester crunch and all). I'll probably release an alpha prior to that (maybe first week of May) for testing some bug fixes across platforms. cheers! chris From nagel at moldiag.de Tue Apr 21 14:31:29 2009 From: nagel at moldiag.de (Mato Nagel) Date: Tue, 21 Apr 2009 16:31:29 +0200 Subject: [Bioperl-l] Exact codon numbering Message-ID: <49EDD8C1.7000101@moldiag.de> Dear colleagues, I spend this evening browsing all your information but didn't succeed in finding a module that translates feature data (CDS and mRNA) into codon numbering. I developed a routine that from an NCBI xml-file creates a structure $exonstructure =[ splice_variant_1->[exon_1->{mRNA_from ->'1', mRNA_to->'something', cDNA_from->'something', cDNA_to->'something', CDS_from->'something', CDS_to->'something', } exon_2->{...} ... ] splice_variant_2 [... ] ] I wonder if it is worth publishing this routine in BioPerl. Looking forward to receiving an answer. Sincerely Yours Mato Nagel From dan.bolser at gmail.com Wed Apr 22 10:49:42 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 11:49:42 +0100 Subject: [Bioperl-l] Creating a fastq format file? Message-ID: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Creating a fastq format file from fasta and 'fasta quality file'? Hi, I'm sure this is easy, but I'm still not able to 'think bioperl'... I have a 'fasta quality file' and a fasta file, and I would like to output a fastq file. I followed the discussion on the previous thread here: http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html With the conclusion seeming to be 'just do it'. Could someone point me at a way to do this, or was that suggestion an error? i.e. the poster was working out a way to create a fastq the only way possible... I get the feeling that this should be a one-liner, but perhaps the above thread was demonstrating the code I need to copy. Thanks for any suggestions, Dan. From drummike at gmail.com Wed Apr 22 12:28:08 2009 From: drummike at gmail.com (Mike Williams) Date: Wed, 22 Apr 2009 08:28:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > Creating a fastq format file from fasta and 'fasta quality file'? > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? Hi there. You should take a look at the documentation for formatdb, that will get you there. http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Mike From dan.bolser at gmail.com Wed Apr 22 13:10:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:10:14 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220610m7ef63a63m8590956d32d57d17@mail.gmail.com> 2009/4/22 Mike Williams : > On Wed, Apr 22, 2009 at 6:49 AM, Dan Bolser wrote: > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? > > > Hi there. ?You should take a look at the documentation for formatdb, that > will get you there. > > http://www.ncbi.nlm.nih.gov/BLAST/docs/formatdb.html Really? I don't find the word fastq anywhere in that file... I know the fastq format isn't that complex, but why write my own custom conversion utility if one already exists right? Bioperl is so good at converting between other formats, I just assumed there should be a couple of lines to get this done. Cheers, Dan. -- Talk live to HOT bioperl developers in your area NOW!! irc://irc.freenode.net/#bioperl > Mike > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Wed Apr 22 13:32:15 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:32:15 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> In the Bio::SeqIO::fastq page: http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq I read: "This object can transform Bio::Seq and Bio::Seq::Quality objects to and from fastq flat file databases." I'm not sure how to code the link between the fastq IO object and the qual object that I have created using the code from the previous thread... Any suggestions? What am I missing? 2009/4/22 Dan Bolser : > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > From dan.bolser at gmail.com Wed Apr 22 13:36:03 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Apr 2009 14:36:03 +0100 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <892884AD17FA42DA96BA586AEAE2170E@NewLife> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <892884AD17FA42DA96BA586AEAE2170E@NewLife> Message-ID: <2c8757af0904220636q6ad96152p63405e03bbe85e6f@mail.gmail.com> Cheers Mark - I was having difficulty understanding that module... I should read more and post less ;-) I got it figured out now... Here is my working code, based on the example kindly posted by Phillip San Miguel #!/usr/bin/perl -w use warnings; use strict; use Bio::SeqIO; use Bio::Seq::Quality; my ($seq_infile,$qual_infile) =(scalar @ARGV == 1) ?($ARGV[0] ,"$ARGV[0].qual") :@ARGV; #Create input objects for both a seq (fasta) and qual file my $in_seq_obj = Bio::SeqIO->new( -file => $seq_infile, -format => 'fasta', ); my $in_qual_obj = Bio::SeqIO->new( -file => $qual_infile, -format => 'qual', ); my $out_fastq_obj = Bio::SeqIO->new( -format => 'fastq' ); while (1){ ## create objects for both a seq and its associated qual my $seq_obj = $in_seq_obj->next_seq || last; my $qual_obj = $in_qual_obj->next_seq; #use seq and qual object methods feed info for new BSQ object my $bsq_obj = Bio::Seq::Quality->new( -seq => $seq_obj->seq(), -qual => $qual_obj->qual(), ); $out_fastq_obj->write_fastq($bsq_obj); exit; } 2009/4/22 Mark A. Jensen : > Dan- There is a fastq module under Bio::SeqIO. Do something like > > ? ? ? ? use Bio::Seq::Quality; > ? ? ? ? use Bio::SeqIO; > ? ? ? ? ? ? ? ?# from Bio::Seq::Quality synopsis... > ? ? ? ?my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; > ? ? ? ?my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; > > ? ? ? ?my $seq = Bio::Seq::Quality->new > ? ? ? ? ? ?( -qual => $qual, > ? ? ? ? ? ? ?-trace_indices => $trace, > ? ? ? ? ? ? ?-seq => ?'atcgatcgatcg', > ? ? ? ? ? ? ?-id ?=> 'human_id', > ? ? ? ? ? ? ?-accession_number => 'S000012', > ? ? ? ? ? ? ?-verbose => -1 ? # to silence deprecated methods > ? ? ? ?); > ? ? ? # typical Bio::SeqIO call > ? ? ? $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); > ? ? ? $seqio->write_seq($seq); > > Mark > ----- Original Message ----- From: "Dan Bolser" > To: > Sent: Wednesday, April 22, 2009 6:49 AM > Subject: [Bioperl-l] Creating a fastq format file? > > >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From maj at fortinbras.us Wed Apr 22 13:33:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 22 Apr 2009 09:33:08 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <892884AD17FA42DA96BA586AEAE2170E@NewLife> Dan- There is a fastq module under Bio::SeqIO. Do something like use Bio::Seq::Quality; use Bio::SeqIO; # from Bio::Seq::Quality synopsis... my $qual = '0 1 2 3 4 5 6 7 8 9 11 12'; my $trace = '0 5 10 15 20 25 30 35 40 45 50 55'; my $seq = Bio::Seq::Quality->new ( -qual => $qual, -trace_indices => $trace, -seq => 'atcgatcgatcg', -id => 'human_id', -accession_number => 'S000012', -verbose => -1 # to silence deprecated methods ); # typical Bio::SeqIO call $seqio = Bio::SeqIO( -file => ">your_file", -format=>'fastq'); $seqio->write_seq($seq); Mark ----- Original Message ----- From: "Dan Bolser" To: Sent: Wednesday, April 22, 2009 6:49 AM Subject: [Bioperl-l] Creating a fastq format file? > Creating a fastq format file from fasta and 'fasta quality file'? > > > Hi, > > I'm sure this is easy, but I'm still not able to 'think bioperl'... > > I have a 'fasta quality file' and a fasta file, and I would like to > output a fastq file. I followed the discussion on the previous thread > here: > > http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html > > > With the conclusion seeming to be 'just do it'. Could someone point me > at a way to do this, or was that suggestion an error? i.e. the poster > was working out a way to create a fastq the only way possible... > > I get the feeling that this should be a one-liner, but perhaps the > above thread was demonstrating the code I need to copy. > > > Thanks for any suggestions, > > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mmuratet at hudsonalpha.org Wed Apr 22 14:03:57 2009 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Wed, 22 Apr 2009 09:03:57 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> Message-ID: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > In the Bio::SeqIO::fastq page: > > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq > > > I read: > > "This object can transform Bio::Seq and Bio::Seq::Quality objects to > and from fastq flat file databases." > > I'm not sure how to code the link between the fastq IO object and the > qual object that I have created using the code from the previous > thread... > > Any suggestions? What am I missing? Howdy This might be a good place to ask the question: having looked at the fastq.pm page, is the fastq format defined (only) by a "@'" followed by a sequence line and a "+" header followed by a quality line and the two headers have to agree? Now that Illumina is using phred scaling, are 'Sanger' and 'Illumina' versions the same? Thanks Mike > > > > 2009/4/22 Dan Bolser : >> Creating a fastq format file from fasta and 'fasta quality file'? >> >> >> Hi, >> >> I'm sure this is easy, but I'm still not able to 'think bioperl'... >> >> I have a 'fasta quality file' and a fasta file, and I would like to >> output a fastq file. I followed the discussion on the previous thread >> here: >> >> http://bioperl.org/pipermail/bioperl-l/2008-July/028013.html >> >> >> With the conclusion seeming to be 'just do it'. Could someone point >> me >> at a way to do this, or was that suggestion an error? i.e. the poster >> was working out a way to create a fastq the only way possible... >> >> I get the feeling that this should be a one-liner, but perhaps the >> above thread was demonstrating the code I need to copy. >> >> >> Thanks for any suggestions, >> >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Apr 22 13:38:53 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 22 Apr 2009 09:38:53 -0400 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: Hi Carlos, I am moving your inquiry to the BioPerl list, as the tool is a part of Bioperl-db and uses BioPerl for parsing the ontologies. In your case, the goflat parser in BioPerl seems to balk at the second one of the input files. It may be that the input file is (was?) corrupted, that does happen every once in a while. More likely though is that the goflat parser hasn't kept up with some format changes. Have you tried using the obo format version instead? -hilmar On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > Hi guys > > I'm working with biosql and I try to figure out how to load > ontologies into biosql. > > I've tried > > load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- > host localhost --dbname biosql --namespace "Gene Ontology" --format > goflat --fmtargs "-defs_file,GO.defs" function.ontology > process.ontology component.ontology > > as in the script info but I have an error, > > > ------------------- WARNING --------------------- > MSG: DBLink exists in the dblink of _default > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: format error (file process.ontology) offending line: > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > regulation of angiogenesis ; synonym:down\-regulation of > angiogenesis ; synonym:downregulation of angiogenesis ; > synonym:inhibition of angiogenesis % negative regulation of > developmental process ; GO:0051093 % regulation of angiogenesis ; GO: > 0045765 > > STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ > perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ > Bio/OntologyIO/dagflat.pm:284 > STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ > 5.10.0/Bio/OntologyIO/dagflat.pm:317 > STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ > load_ontology.pl:604 > ------------------------------------- > > Any suggestion? > > Cheers, > > Carlos > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Apr 22 14:50:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Apr 2009 09:50:47 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: On Apr 22, 2009, at 9:03 AM, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and > the two headers have to agree? Now that Illumina is using phred > scaling, are 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike I think that's how it is defined, but I remember a while ago finding a formal definition of the format was a bit difficult. Looks like that has been rectified: http://maq.sourceforge.net/fastq.shtml If the parser doesn't read Illumina FASTQ format feel free to post a bug report with some example data. I'm sure this will be needed functionality in the future (and it shouldn't be too hard to add in). chris From hans-rudolf.hotz at fmi.ch Wed Apr 22 14:58:21 2009 From: hans-rudolf.hotz at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 22 Apr 2009 16:58:21 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed > by a sequence line and a "+" header followed by a quality line and the > two headers have to agree? Now that Illumina is using phred scaling, > are 'Sanger' and 'Illumina' versions the same? No, see: http://maq.sourceforge.net/fastq.shtml Regards, Hans > > Thanks > > Mike From j_martin at lbl.gov Wed Apr 22 15:58:15 2009 From: j_martin at lbl.gov (Joel Martin) Date: Wed, 22 Apr 2009 08:58:15 -0700 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> Message-ID: <20090422155815.GA14402@eniac.jgi-psf.org> On Wed, Apr 22, 2009 at 09:03:57AM -0500, Michael Muratet wrote: > > On Apr 22, 2009, at 8:32 AM, Dan Bolser wrote: > >> In the Bio::SeqIO::fastq page: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/SeqIO/fastq.pm#write_seq >> >> >> I read: >> >> "This object can transform Bio::Seq and Bio::Seq::Quality objects to >> and from fastq flat file databases." >> >> I'm not sure how to code the link between the fastq IO object and the >> qual object that I have created using the code from the previous >> thread... >> >> Any suggestions? What am I missing? > > Howdy > > This might be a good place to ask the question: having looked at the > fastq.pm page, is the fastq format defined (only) by a "@'" followed by a > sequence line and a "+" header followed by a quality line and the two > headers have to agree? Now that Illumina is using phred scaling, are > 'Sanger' and 'Illumina' versions the same? > > Thanks > > Mike No they aren't the same, Illumina still encodes the ascii as value + 64 and Sanger as value + 33. Joel From j_martin at lbl.gov Thu Apr 23 09:32:08 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 23 Apr 2009 02:32:08 -0700 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> References: <49E4A39D.2020909@gmail.com> <20090415065037.GB1175@eniac.jgi-psf.org> <67822033-2EA7-4C79-B5E3-BC4C7AA76FBA@illinois.edu> Message-ID: <20090423093208.GB22615@eniac.jgi-psf.org> Hello, Maybe they put the headers back in the separate distribution, they seem to be there now. ls -l io_lib-1.11.6/io_lib/abi.h 4 -rw-r--r-- 1 me mypeeps 793 Dec 10 06:54 io_lib-1.11.6/io_lib/abi.h And I can get the ABI-tests to pass with the bioperl-ext on linux, though it takes some odd contortions of the Makefile to get it to compile here. [snip] # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 1 wallclock secs ( 0.95 cusr + 0.06 csys = 1.01 CPU) I might find time to take a shot at getting it to compile cleanerly for linux and solaris, unless you think that's pointless as the BioLib conversion might happen before summer? Joel On Wed, Apr 15, 2009 at 07:26:15AM -0500, Chris Fields wrote: > Joel, > > They haven't stopped supporting it. IIRC the separate io_lib distribution > no longer has the ABI headers, but the io_lib with the full staden package > does (a little confusing, yes). I have 1.11.6 and ABI-related tests for > bioperl and bioperl-ext don't pass, but compiling with an earlier version > does work. It may be as simple as including the header files from an old > version, but I haven't tried that. > > chris > > On Apr 15, 2009, at 1:50 AM, Joel Martin wrote: > >> Hello, >> Do you know where it says io_lib will stop supporting ABI? We use >> the latest ( 1.11.6 ) for this so I know it does read them and I just >> checked with one fresh off a sequencer. But I couldn't find an active >> forum for staden. >> >> Thanks, >> Joel >> >> On Tue, Apr 14, 2009 at 10:20:00AM -0500, Chris Fields wrote: >>> For ABI files you'll need an older version of io_lib that supports ABI or >>> the io_lib that comes with the full staden package. Recent versions of >>> io_lib don't have ABI support built-in anymore. >>> >>> chris >>> >>> On Apr 14, 2009, at 9:54 AM, Roy Chaudhuri wrote: >>> >>>> Hi Mike. >>>> >>>> I did get that problem solved in the end, thanks to lots of help from >>>> Aaron Mackey. Looking at the bioperl-l archives it seems like we stopped >>>> cc-ing the mailing list at some point. The last archived message in the >>>> thread (http://bioperl.org/pipermail/bioperl-l/2005-May/018925.html) had >>>> the correct solution - the code change was incorporated into the >>>> bioperl-ext CVS, and is in the latest version that you can get from SVN >>>> (see http://www.bioperl.org/wiki/Ext_package). If that doesn't solve the >>>> problem you must be experiencing a different issue. >>>> >>>> You should also bear in mind the message Chris Fields sent to the list a >>>> few days ago, and have a look at using BioLib instead: >>>> >>>>> Just to note, we're not actively supporting much of the bioperl-ext >>>>> code, in favor of the BioLib initiative: >>>>> http://biolib.open-bio.org/wiki/Main_Page >>>>> If you do use bioperl-ext I suggest only using the latest code from >>>>> svn >>>>> (and that in combination with bioperl-live). >>>>> >>>>> chris >>>> >>>> Hope this helps. >>>> Roy. >>>> >>>> >>>> >>>> Michael Stubbington wrote: >>>>> Dear Dr. Chaudhuri, >>>>> I am currently trying to write a bioperl script that parses .abi >>>>> sequence >>>>> files. I am having exactly the same problem as you did when >>>>> you posted this enquiry to the bioperl mailing list >>>>> http://bioperl.org/pipermail/bioperl-l/2005-May/018898.html. I was >>>>> wondering if you ever solved the problem and, if so, can you remember >>>>> what you did? I?d be very grateful for any help you can provide. I >>>>> can?t find this problem mentioned anywhere else online. >>>>> Thank you for your time. >>>>> Mike >>>> >>>> -- >>>> Dr. Roy Chaudhuri >>>> Department of Veterinary Medicine >>>> University of Cambridge, U.K. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Apr 23 15:45:34 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Apr 2009 08:45:34 -0700 Subject: [Bioperl-l] Request concerning BioPerl In-Reply-To: <49F0300C.2060700@moldiag.de> References: <49F0300C.2060700@moldiag.de> Message-ID: Mato- Please ask on the mailing list - there is documention in the perldoc for starters and the rest depends on how you are querying for accessions or using Entrez queries. -jason On Apr 23, 2009, at 2:08 AM, Mato Nagel wrote: > Dear colleagues, > where are the options documented? > > $gb = Bio::DB::GenBank->new(@options) > > Sincerely Yours > Mato Nagel Jason Stajich jason at bioperl.org From dan.bolser at gmail.com Fri Apr 24 15:24:17 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 16:24:17 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? Message-ID: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Hi all, I couldn't find out how to get the 'clear range' from a Bio::Seq::Quality object... Am I looking in the wrong place, or should this method be a part of the Bio::Seq::Quality class? In the latter case I'm on my way to an implementation, but I am not good at navigating the bioperl docs, so I thought I should ask before I take the time to finish that off. Cheers, Dan. From dan.bolser at gmail.com Fri Apr 24 16:20:23 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 24 Apr 2009 17:20:23 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> Message-ID: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Its a bit rough and ready, but it does what I need... =head2 get_clear_range Title : get_clear_range Title : subqual Usage : $subobj = $obj->get_clear_range(); $subobj = $obj->get_clear_range(20); Function : Get the clear range using the given quality score as a cutoff or a default value of 13. Returns : a new Bio::Seq::Quality object Args : a minimum quality value, optional, devault = 13 =cut sub get_clear_range { my $self = shift; my $qual = $self->qual; my $minQual = shift || 13; my (@ranges, $rangeFlag); for(my $i=0; $i<@$qual; $i++){ ## Are we currently within a clear range or not? if(defined($rangeFlag)){ ## Did we just leave the clear range? if($qual->[$i]<$minQual){ ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; ## and reset the range flag. $rangeFlag = undef; } ## else nothing changes } else{ ## Did we just enter a clear range? if($qual->[$i]>=$minQual){ ## Better set the range flag! $rangeFlag = $i; } ## else nothing changes } } ## Did we exit the last clear range? if(defined($rangeFlag)){ my $i = scalar(@$qual); ## Log the range push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; } unless(@ranges){ die "There is no clear range... I don't know what to do here!\n"; } print "there are ", scalar(@ranges), " clear ranges\n"; my $sum; map {$sum += $_->[2]} @ranges; print "of ", scalar(@$qual), " bases, there are $sum with ". "quality scores above the given threshold\n"; for (sort {$b->[2] <=> $a->[2]} @ranges){ if($_->[2]/$sum < 0.5){ warn "not so much a clear range as a clear chunk...\n"; } print $_->[2], "\t", $_->[2]/$sum, "\n"; return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, $_->[1]+1), -qual => $self->subqual($_->[0]+1, $_->[1]+1) ); } } Note, for testing I made a package called Bio/Seq/QualityDB.pm (which is a copy of Bio/Seq/Quality.pm that just has the above method added). That is why the 'new Bio::Seq::Quality object' is actually a Bio::Seq::QualityDB object, but other than that it should slot right in (apart from all the debugging output that I spit out). Cheers, Dan. 2009/4/24 Dan Bolser : > Hi all, > > I couldn't find out how to get the 'clear range' from a > Bio::Seq::Quality object... Am I looking in the wrong place, or should > this method be a part of the Bio::Seq::Quality class? > > In the latter case I'm on my way to an implementation, but I am not > good at navigating the bioperl docs, so I thought I should ask before > I take the time to finish that off. > > > Cheers, > Dan. > From cjfields at illinois.edu Fri Apr 24 18:56:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 13:56:34 -0500 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> Message-ID: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If possible, tests don't hurt either! chris On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > Its a bit rough and ready, but it does what I need... > > > > > =head2 get_clear_range > > Title : get_clear_range > > Title : subqual > Usage : $subobj = $obj->get_clear_range(); > $subobj = $obj->get_clear_range(20); > Function : Get the clear range using the given quality score as a > cutoff or a default value of 13. > > Returns : a new Bio::Seq::Quality object > Args : a minimum quality value, optional, devault = 13 > > =cut > > sub get_clear_range > { > my $self = shift; > my $qual = $self->qual; > my $minQual = shift || 13; > > my (@ranges, $rangeFlag); > > for(my $i=0; $i<@$qual; $i++){ > ## Are we currently within a clear range or not? > if(defined($rangeFlag)){ > ## Did we just leave the clear range? > if($qual->[$i]<$minQual){ > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > ## and reset the range flag. > $rangeFlag = undef; > } > ## else nothing changes > } > else{ > ## Did we just enter a clear range? > if($qual->[$i]>=$minQual){ > ## Better set the range flag! > $rangeFlag = $i; > } > ## else nothing changes > } > } > ## Did we exit the last clear range? > if(defined($rangeFlag)){ > my $i = scalar(@$qual); > ## Log the range > push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; > } > > unless(@ranges){ > die "There is no clear range... I don't know what to do here!\n"; > } > > print "there are ", scalar(@ranges), " clear ranges\n"; > > my $sum; map {$sum += $_->[2]} @ranges; > > print "of ", scalar(@$qual), " bases, there are $sum with ". > "quality scores above the given threshold\n"; > > for (sort {$b->[2] <=> $a->[2]} @ranges){ > if($_->[2]/$sum < 0.5){ > warn "not so much a clear range as a clear chunk...\n"; > } > print $_->[2], "\t", $_->[2]/$sum, "\n"; > > return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, > $_->[1]+1), > -qual => $self->subqual($_->[0]+1, $_->[1]+1) > ); > } > } > > > > > Note, for testing I made a package called Bio/Seq/QualityDB.pm (which > is a copy of Bio/Seq/Quality.pm that just has the above method added). > That is why the 'new Bio::Seq::Quality object' is actually a > Bio::Seq::QualityDB object, but other than that it should slot right > in (apart from all the debugging output that I spit out). > > > Cheers, > Dan. > > > 2009/4/24 Dan Bolser : >> Hi all, >> >> I couldn't find out how to get the 'clear range' from a >> Bio::Seq::Quality object... Am I looking in the wrong place, or >> should >> this method be a part of the Bio::Seq::Quality class? >> >> In the latter case I'm on my way to an implementation, but I am not >> good at navigating the bioperl docs, so I thought I should ask before >> I take the time to finish that off. >> >> >> Cheers, >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri Apr 24 19:39:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 24 Apr 2009 12:39:53 -0700 Subject: [Bioperl-l] cvs server still up? Message-ID: <49F21589.6060707@cornell.edu> The old bioperl CVS repository is still up: cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl export -rHEAD bioperl-live I had an old script that was cvs exporting a copy of bioperl, and it has been fetching really old copies for a while now. Maybe somebody might want to deactivate that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Apr 24 20:29:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 15:29:22 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F21589.6060707@cornell.edu> References: <49F21589.6060707@cornell.edu> Message-ID: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Not sure what the plans were for the CVS server beyond having it available for all older bioperl releases (pre-1.6). Everything has been moved into the svn server though, so really the cvs server is redundant. Shutting it down might serve the purpose of alerting users to the fact that we no longer use it! Thinking some more about it: it might be present simply b/c other open- bio projects are still using cvs. I can't recall if biopython switched over or not... chris On Apr 24, 2009, at 2:39 PM, Robert Buels wrote: > The old bioperl CVS repository is still up: > cvs -d :pserver:cvs:cvs\@cvs.bioperl.org:/home/repository/bioperl > export -rHEAD bioperl-live > > I had an old script that was cvs exporting a copy of bioperl, and it > has been fetching really old copies for a while now. > > Maybe somebody might want to deactivate that? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Fri Apr 24 21:03:27 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 24 Apr 2009 16:03:27 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> Message-ID: <49F2291F.7020704@jays.net> Chris Fields wrote: > I can't recall if biopython switched over or not... http://github.com/biopython "Official git mirror of the Biopython CVS repository" Ponder, j From cjfields at illinois.edu Fri Apr 24 22:50:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Apr 2009 17:50:12 -0500 Subject: [Bioperl-l] cvs server still up? In-Reply-To: <49F2291F.7020704@jays.net> References: <49F21589.6060707@cornell.edu> <2A54079B-FAE1-4D1B-BCDA-A5E570749B25@illinois.edu> <49F2291F.7020704@jays.net> Message-ID: <9AC3AF4D-E9FF-4593-A53A-B59438EC2BA4@illinois.edu> Which makes me wonder, is the CVS version actually updated with git commits (and vice versa) or is git the only thing being used? It is listed as a 'mirror', so I'm assuming they somehow sync to/from CVS (ugh). chris On Apr 24, 2009, at 4:03 PM, Jay Hannah wrote: > Chris Fields wrote: >> I can't recall if biopython switched over or not... > > http://github.com/biopython > "Official git mirror of the Biopython CVS repository" > > Ponder, > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Sun Apr 26 05:50:14 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 26 Apr 2009 15:50:14 +1000 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: <20090422155815.GA14402@eniac.jgi-psf.org> References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > > This might be a good place to ask the question: having looked at the > > fastq.pm page, is the fastq format defined (only) by a "@'" followed by > a > > sequence line and a "+" header followed by a quality line and the two > > headers have to agree? Now that Illumina is using phred scaling, are > > 'Sanger' and 'Illumina' versions the same? > > No they aren't the same, Illumina still encodes the ascii as value + 64 > and Sanger as value + 33. > Illumina have now CHANGED how they calculate the quality value however in the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, but now they produce Q >= 0 and it is unclear if they start at 69 or 64 now... I have tried to summarise this in a central place: http://en.wikipedia.org/wiki/FASTQ_format Corrections welcome! --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From heikki.lehvaslaiho at gmail.com Mon Apr 27 05:42:03 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 07:42:03 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: > I have tried to summarise this in a central place: > http://en.wikipedia.org/wiki/FASTQ_format Torsten, Thanks for putting this together. Very helpful. Do you have a plan of action? Let me propose one for BioPerl. It based on following assumptions: 1. There is multitude of different ways of coding quality values out there. 2. Bio::Seq::Quality is agnostic of any quality value range rules 3. The emerging open standard is the Sanger fastq specification 4. Open source programs use the Sanger fastq specs >From these it follows that: 1. BioPerl should support Sanger fastq standard 1.1. it already does and there are other SeqIO modules for dealing with other non-fastq formats. 2. BioPerl should offer simple ways of converting between quality range rules 2.1. Have a generic method accessible from Bio::Seq::Quality with preset versions of the method for converting between known variants (Sanger fastq and the two Illumina versions) For example: range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) throw if $value < $from_lower or $value > $from_upper return $newvalue range_convert_illumina2fastq(), range_convert_fastq2illumina(), range_convert_fastq2phred(), range_convert_phred2fastq().... (assuming that illumina 1.3 eq phred) 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina qualities into Sanger fastq on the fly 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of quality value range either automatically or be given a keyword parameter indicating the range. 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects a quality value out of range. 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it detects a quality value out of range. 2.2.4. It would be useful but not absolutely necessary for Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina ranges What do you think? -Heikki 2009/4/26 Torsten Seemann : >> > This might be a good place to ask the question: having looked at the >> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >> a >> > sequence line and a "+" header followed by a quality line and the two >> > headers have to agree? Now that Illumina is using phred scaling, are >> > 'Sanger' and 'Illumina' versions the same? >> >> No they aren't the same, Illumina still encodes the ascii as value + 64 >> and Sanger as value + 33. >> > > Illumina have now CHANGED how they calculate the quality value however in > the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, > but now they produce Q >= 0 and it is unclear if they start at 69 or 64 > now... > > I have tried to summarise this in a central place: > > http://en.wikipedia.org/wiki/FASTQ_format > > Corrections welcome! > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 06:42:08 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 08:42:08 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: Dan, It looks like your method does two different things: 1. Returns the longest subsequence above the threshold 2. Analyses the the sequence for the number of ranges the current threshold creates. Why not separate these functions? Lets add a method that sets the threshold and stores it internally as $self->_threshold. Setting it to a new values should trigger emptying all the caches (see below.) Lets have two more public methods: 1. get_clean_range() - optional argument 'threshold' It returns the longest clean subseq. 2. count_clean_ranges() -again optional argument 'threshold' This returns the number of ranges detected. Both methods call first the public method threshold if the argument has been given and then an internal method _find_clean_ranges(). That method calculates all the ranges and stores them internally (as $self->_clean_ranges-> [...]). The number of ranges is also stored (e.g. $self->_number_of ranges).These internal values form the cache that needs to be emptied whenever any of the critical values of the object changes: threshold, quality or seq. Create an internal method $self->_clear_cache, that does that. Now the quality new object does not get created until you call get_clean_range() which accesses the cached values (or creates them if they are not there). This design allows you to have no extra penalty for adding more methods that act on cached values. For example, it might be sensible thing to do at some point to look at all the ranges that are longer than some length. Then you could write in your program: $qual->threshold(10); if ($qual->count_clean_ranges = 1) { my $newqual = $qual->get_clean_range() # do your analysis } elsif ($qual->count_clean_ranges = 0) { # do some reporting and logging } else { # more than one ranges my @quals = $qual->get_all_clean_ranges($min_lenght); # do some more work and possibly select the best one(s) } Yours, -Heikki 2009/4/24 Chris Fields : > You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If > possible, tests don't hurt either! > > chris > > On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: > >> Its a bit rough and ready, but it does what I need... >> >> >> >> >> =head2 get_clear_range >> >> Title ? ?: get_clear_range >> >> Title ? ?: subqual >> Usage ? ?: $subobj = $obj->get_clear_range(); >> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >> Function : Get the clear range using the given quality score as a >> ? ? ? ? ? cutoff or a default value of 13. >> >> Returns ?: a new Bio::Seq::Quality object >> Args ? ? : a minimum quality value, optional, devault = 13 >> >> =cut >> >> sub get_clear_range >> { >> ? my $self = shift; >> ? my $qual = $self->qual; >> ? my $minQual = shift || 13; >> >> ? my (@ranges, $rangeFlag); >> >> ? for(my $i=0; $i<@$qual; $i++){ >> ? ? ? ?## Are we currently within a clear range or not? >> ? ? ? ?if(defined($rangeFlag)){ >> ? ? ? ? ? ?## Did we just leave the clear range? >> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >> ? ? ? ? ? ? ? ?## Log the range >> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? ? ? ? ? ? ? ?## and reset the range flag. >> ? ? ? ? ? ? ? ?$rangeFlag = undef; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? ? ? ?else{ >> ? ? ? ? ? ?## Did we just enter a clear range? >> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >> ? ? ? ? ? ? ? ?## Better set the range flag! >> ? ? ? ? ? ? ? ?$rangeFlag = $i; >> ? ? ? ? ? ?} >> ? ? ? ? ? ?## else nothing changes >> ? ? ? ?} >> ? } >> ? ## Did we exit the last clear range? >> ? if(defined($rangeFlag)){ >> ? ? ? ?my $i = scalar(@$qual); >> ? ? ? ?## Log the range >> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >> ? } >> >> ? unless(@ranges){ >> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >> ? } >> >> ? print "there are ", scalar(@ranges), " clear ranges\n"; >> >> ? my $sum; map {$sum += $_->[2]} @ranges; >> >> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >> ? ? ? ?"quality scores above the given threshold\n"; >> >> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >> ? ? ? ?if($_->[2]/$sum < 0.5){ >> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >> ? ? ? ?} >> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >> >> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >> $_->[1]+1), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >> $_->[1]+1) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >> ? } >> } >> >> >> >> >> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >> is a copy of Bio/Seq/Quality.pm that just has the above method added). >> That is why the 'new Bio::Seq::Quality object' is actually a >> Bio::Seq::QualityDB object, but other than that it should slot right >> in (apart from all the debugging output that I spit out). >> >> >> Cheers, >> Dan. >> >> >> 2009/4/24 Dan Bolser : >>> >>> Hi all, >>> >>> I couldn't find out how to get the 'clear range' from a >>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>> this method be a part of the Bio::Seq::Quality class? >>> >>> In the latter case I'm on my way to an implementation, but I am not >>> good at navigating the bioperl docs, so I thought I should ask before >>> I take the time to finish that off. >>> >>> >>> Cheers, >>> Dan. >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From dan.bolser at gmail.com Mon Apr 27 08:31:39 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 27 Apr 2009 09:31:39 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> Message-ID: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method ?_find_clean_ranges(). That > method calculates all the ranges and stores them internally ?(as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form ?the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do ?at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > ?my $newqual = $qual->get_clean_range() > ?# do your analysis > } elsif ($qual->count_clean_ranges = 0) { > ? # do some reporting and logging > } else { ?# more than one ranges > ? my @quals = $qual->get_all_clean_ranges($min_lenght); > ? # do some more work and possibly select the best one(s) > } > > > > Yours, > > ? -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title ? ?: get_clear_range >>> >>> Title ? ?: subqual >>> Usage ? ?: $subobj = $obj->get_clear_range(); >>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> ? ? ? ? ? cutoff or a default value of 13. >>> >>> Returns ?: a new Bio::Seq::Quality object >>> Args ? ? : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> ? my $self = shift; >>> ? my $qual = $self->qual; >>> ? my $minQual = shift || 13; >>> >>> ? my (@ranges, $rangeFlag); >>> >>> ? for(my $i=0; $i<@$qual; $i++){ >>> ? ? ? ?## Are we currently within a clear range or not? >>> ? ? ? ?if(defined($rangeFlag)){ >>> ? ? ? ? ? ?## Did we just leave the clear range? >>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>> ? ? ? ? ? ? ? ?## Log the range >>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? ? ? ? ? ? ? ?## and reset the range flag. >>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? ? ? ?else{ >>> ? ? ? ? ? ?## Did we just enter a clear range? >>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>> ? ? ? ? ? ? ? ?## Better set the range flag! >>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>> ? ? ? ? ? ?} >>> ? ? ? ? ? ?## else nothing changes >>> ? ? ? ?} >>> ? } >>> ? ## Did we exit the last clear range? >>> ? if(defined($rangeFlag)){ >>> ? ? ? ?my $i = scalar(@$qual); >>> ? ? ? ?## Log the range >>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ? } >>> >>> ? unless(@ranges){ >>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>> ? } >>> >>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> ? my $sum; map {$sum += $_->[2]} @ranges; >>> >>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>> ? ? ? ?"quality scores above the given threshold\n"; >>> >>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>> ? ? ? ?} >>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>> $_->[1]+1), >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>> ? } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From heikki.lehvaslaiho at gmail.com Mon Apr 27 09:38:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:38:40 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: I convinced at least myself to the degree that I wrote the range_convert() method - with plenty of tests. I mention this now so that no-one else need to start thinking through all the edge values. :) I'll contribute it to the code base once there is a consensus of best way forward. -Heikki 2009/4/27 Heikki Lehvaslaiho : >> I have tried to summarise this in a central place: >> http://en.wikipedia.org/wiki/FASTQ_format > > Torsten, > > Thanks for putting this together. Very helpful. > > Do you have a plan of action? ?Let me propose one for BioPerl. It > based on following assumptions: > > 1. There is multitude of different ways of coding quality values out there. > 2. Bio::Seq::Quality is agnostic of any quality value range rules > 3. The emerging open standard is the Sanger fastq specification > 4. Open source programs use the Sanger fastq specs > > > From these it follows that: > > > 1. BioPerl should support Sanger fastq standard > > 1.1. it already does and there are other SeqIO modules for dealing > with other non-fastq formats. > > 2. BioPerl should offer simple ways of converting between quality range rules > > 2.1. Have a generic method accessible from Bio::Seq::Quality with > preset versions of the method for converting between known variants > (Sanger fastq and the two Illumina versions) > > For example: > > range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) > ?throw if $value < $from_lower or $value > $from_upper > ?return $newvalue > > range_convert_illumina2fastq(), range_convert_fastq2illumina(), > range_convert_fastq2phred(), ?range_convert_phred2fastq().... > > (assuming that illumina 1.3 eq phred) > > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina > qualities into Sanger fastq on the fly > > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of > quality value range either automatically or be given a keyword > parameter indicating the range. > > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects > a quality value out of range. > > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it > detects a quality value out of range. > > 2.2.4. It would be useful but not absolutely necessary for > Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina > ranges > > > What do you think? > > ? ?-Heikki > > 2009/4/26 Torsten Seemann : >>> > This might be a good place to ask the question: having looked at the >>> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by >>> a >>> > sequence line and a "+" header followed by a quality line and the two >>> > headers have to agree? Now that Illumina is using phred scaling, are >>> > 'Sanger' and 'Illumina' versions the same? >>> >>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>> and Sanger as value + 33. >>> >> >> Illumina have now CHANGED how they calculate the quality value however in >> the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+, >> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >> now... >> >> I have tried to summarise this in a central place: >> >> http://en.wikipedia.org/wiki/FASTQ_format >> >> Corrections welcome! >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 09:41:52 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 11:41:52 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, I'll take your code and put it into bioperl-live rewritten the way I suggested and add few tests. That should get you started, -Heikki 2009/4/27 Dan Bolser : > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method ?_find_clean_ranges(). That >> method calculates all the ranges and stores them internally ?(as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form ?the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do ?at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> ?my $newqual = $qual->get_clean_range() >> ?# do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> ? # do some reporting and logging >> } else { ?# more than one ranges >> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >> ? # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> ? -Heikki >> >> 2009/4/24 Chris Fields : >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title ? ?: get_clear_range >>>> >>>> Title ? ?: subqual >>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> ? ? ? ? ? cutoff or a default value of 13. >>>> >>>> Returns ?: a new Bio::Seq::Quality object >>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> ? my $self = shift; >>>> ? my $qual = $self->qual; >>>> ? my $minQual = shift || 13; >>>> >>>> ? my (@ranges, $rangeFlag); >>>> >>>> ? for(my $i=0; $i<@$qual; $i++){ >>>> ? ? ? ?## Are we currently within a clear range or not? >>>> ? ? ? ?if(defined($rangeFlag)){ >>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>> ? ? ? ? ? ? ? ?## Log the range >>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? ? ? ?else{ >>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>> ? ? ? ? ? ?} >>>> ? ? ? ? ? ?## else nothing changes >>>> ? ? ? ?} >>>> ? } >>>> ? ## Did we exit the last clear range? >>>> ? if(defined($rangeFlag)){ >>>> ? ? ? ?my $i = scalar(@$qual); >>>> ? ? ? ?## Log the range >>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ? } >>>> >>>> ? unless(@ranges){ >>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>> ? } >>>> >>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> ? ? ? ?"quality scores above the given threshold\n"; >>>> >>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>> ? ? ? ?} >>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>> $_->[1]+1), >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>> ? } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 13:10:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 08:10:04 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: This is going within Bio::Seq::Quality, correct? Does Bio::Seq::Quality have a method that indicates what format the quality scores are actually in (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about this is quality scores appear inseparable from their quality format (ranges vary in length, for instance). For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell which quality format it originated from w/o guessing, and similarly could I accurately convert it to another qual format? To me it seems we need something in Bio::Seq::Quality akin to the alphabet() method used for sequence data. chris On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > I convinced at least myself to the degree that I wrote the > range_convert() method - with plenty of tests. I mention this now so > that no-one else need to start thinking through all the edge values. > :) > > I'll contribute it to the code base once there is a consensus of best > way forward. > > -Heikki > > 2009/4/27 Heikki Lehvaslaiho : >>> I have tried to summarise this in a central place: >>> http://en.wikipedia.org/wiki/FASTQ_format >> >> Torsten, >> >> Thanks for putting this together. Very helpful. >> >> Do you have a plan of action? Let me propose one for BioPerl. It >> based on following assumptions: >> >> 1. There is multitude of different ways of coding quality values >> out there. >> 2. Bio::Seq::Quality is agnostic of any quality value range rules >> 3. The emerging open standard is the Sanger fastq specification >> 4. Open source programs use the Sanger fastq specs >> >> >> From these it follows that: >> >> >> 1. BioPerl should support Sanger fastq standard >> >> 1.1. it already does and there are other SeqIO modules for dealing >> with other non-fastq formats. >> >> 2. BioPerl should offer simple ways of converting between quality >> range rules >> >> 2.1. Have a generic method accessible from Bio::Seq::Quality with >> preset versions of the method for converting between known variants >> (Sanger fastq and the two Illumina versions) >> >> For example: >> >> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, >> $value) >> throw if $value < $from_lower or $value > $from_upper >> return $newvalue >> >> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >> range_convert_fastq2phred(), range_convert_phred2fastq().... >> >> (assuming that illumina 1.3 eq phred) >> >> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >> qualities into Sanger fastq on the fly >> >> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream >> of >> quality value range either automatically or be given a keyword >> parameter indicating the range. >> >> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it >> detects >> a quality value out of range. >> >> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >> detects a quality value out of range. >> >> 2.2.4. It would be useful but not absolutely necessary for >> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >> ranges >> >> >> What do you think? >> >> -Heikki >> >> 2009/4/26 Torsten Seemann : >>>>> This might be a good place to ask the question: having looked at >>>>> the >>>>> fastq.pm page, is the fastq format defined (only) by a "@'" >>>>> followed by >>>> a >>>>> sequence line and a "+" header followed by a quality line and >>>>> the two >>>>> headers have to agree? Now that Illumina is using phred scaling, >>>>> are >>>>> 'Sanger' and 'Illumina' versions the same? >>>> >>>> No they aren't the same, Illumina still encodes the ascii as >>>> value + 64 >>>> and Sanger as value + 33. >>>> >>> >>> Illumina have now CHANGED how they calculate the quality value >>> however in >>> the last month or so... Their Q range used to be -5..40 mapped to >>> ASCII 64+, >>> but now they produce Q >= 0 and it is unclear if they start at 69 >>> or 64 >>> now... >>> >>> I have tried to summarise this in a central place: >>> >>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Corrections welcome! >>> >>> >>> --Torsten Seemann >>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>> University, AUSTRALIA >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From markus.liebscher at gmx.de Mon Apr 27 13:51:09 2009 From: markus.liebscher at gmx.de (manni122) Date: Mon, 27 Apr 2009 06:51:09 -0700 (PDT) Subject: [Bioperl-l] Re moteblast using Swissprot Message-ID: <23256705.post@talk.nabble.com> Hi, I want to retrieve the sequence identifier from the remoteblast interface (Bio::Tools::Run::RemoteBlast). With this ID I want to look up annotations stored in the Bio::DB::SwissProt. I am using the example code from the RemoteBlast documentation. If I am using a known sequence as input I get "Can't call method "next_hsp" on an undefined value "? This happens only with swissprot as database - the nr database works fine. The accession code from nr is not accepted from the Bio::DB::SwissProt. Is there something wrong with the database? Here is the code I am using: my $v = 1; my @params = ('-prog' => 'blastp', '-data' => 'nr', '-expect' => '1e-10' ); #swissprot is not working $Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM62'; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); $v = 1; my $r = $factory->submit_blast($proteinaa); print STDERR "Need BLAST Analysis, waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { $factory->remove_rid($rid); $result = $rc->next_result; $hit = $result->next_hit; $hsp = $hit->next_hsp; $idneu = $hit->accession; } } } -- View this message in context: http://www.nabble.com/Remoteblast-using-Swissprot-tp23256705p23256705.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From heikki.lehvaslaiho at gmail.com Mon Apr 27 15:44:40 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:44:40 +0200 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan, Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. Test and extend, -Heikki 2009/4/27 Heikki Lehvaslaiho : > Dan, > > I'll take your code and put it into bioperl-live rewritten the way I > suggested and add few tests. > > That should get you started, > > ? -Heikki > > 2009/4/27 Dan Bolser : >> Hi Heikki, >> >> Thanks very much for the advice on how to better implement the clear >> range method within the Bio::Seq::Quality object. I can understand the >> logic of what you have written, and it all sounds reasonable. The only >> problem is that I am very inexperienced with working on object >> oriented Perl (my 'one man' projects to date have never really >> required me to think beyond scripts, and its been years since I >> actually tried to code objects in Perl). >> >> To be specific, when you say, "Lets add a method that sets the >> threshold and stores it internally as $self->_threshold", ignoring any >> other functionality, what would that method look like? in particular, >> how would $self->_threshold be implemented? >> >> I think once I see that detail, I can go ahead and try to code what >> you suggested. >> >> >> Similarly (Chris), where would I put the tests / how would they be implemented? >> >> >> Thanks again for the feedback. >> >> All the best, >> Dan. >> >> >> >> 2009/4/27 Heikki Lehvaslaiho : >>> Dan, >>> >>> It looks like your method does two different things: >>> >>> 1. Returns the longest subsequence above the threshold >>> 2. Analyses the the sequence for the number of ranges the current >>> threshold creates. >>> >>> Why not separate these functions? >>> >>> Lets add a method that sets the threshold and stores it internally as >>> $self->_threshold. Setting it to a new values should trigger emptying >>> all the caches (see below.) >>> >>> Lets have two more public methods: >>> >>> 1. get_clean_range() - optional argument 'threshold' >>> >>> It returns the longest clean subseq. >>> >>> 2. count_clean_ranges() -again optional argument 'threshold' >>> >>> This returns the number of ranges detected. >>> >>> Both methods call first the public method threshold if the argument >>> has been given and then an internal method ?_find_clean_ranges(). That >>> method calculates all the ranges and stores them internally ?(as >>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>> that needs to be emptied whenever any of the critical values of the >>> object changes: threshold, quality or seq. Create an internal method >>> $self->_clear_cache, that does that. >>> >>> Now the quality new object does not get created until you call >>> get_clean_range() which accesses the cached values (or creates them if >>> they are not there). >>> >>> This design allows you to have no extra penalty for adding more >>> methods that act on cached values. For example, it might be sensible >>> thing to do ?at some point to look at all the ranges that are longer >>> than some length. Then you could write in your program: >>> >>> >>> $qual->threshold(10); >>> if ($qual->count_clean_ranges = 1) { >>> ?my $newqual = $qual->get_clean_range() >>> ?# do your analysis >>> } elsif ($qual->count_clean_ranges = 0) { >>> ? # do some reporting and logging >>> } else { ?# more than one ranges >>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>> ? # do some more work and possibly select the best one(s) >>> } >>> >>> >>> >>> Yours, >>> >>> ? -Heikki >>> >>> 2009/4/24 Chris Fields : >>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>> possible, tests don't hurt either! >>>> >>>> chris >>>> >>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>> >>>>> Its a bit rough and ready, but it does what I need... >>>>> >>>>> >>>>> >>>>> >>>>> =head2 get_clear_range >>>>> >>>>> Title ? ?: get_clear_range >>>>> >>>>> Title ? ?: subqual >>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>> Function : Get the clear range using the given quality score as a >>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>> >>>>> Returns ?: a new Bio::Seq::Quality object >>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>> >>>>> =cut >>>>> >>>>> sub get_clear_range >>>>> { >>>>> ? my $self = shift; >>>>> ? my $qual = $self->qual; >>>>> ? my $minQual = shift || 13; >>>>> >>>>> ? my (@ranges, $rangeFlag); >>>>> >>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>> ? ? ? ? ? ? ? ?## Log the range >>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? ? ? ?else{ >>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>> ? ? ? ? ? ?} >>>>> ? ? ? ? ? ?## else nothing changes >>>>> ? ? ? ?} >>>>> ? } >>>>> ? ## Did we exit the last clear range? >>>>> ? if(defined($rangeFlag)){ >>>>> ? ? ? ?my $i = scalar(@$qual); >>>>> ? ? ? ?## Log the range >>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>> ? } >>>>> >>>>> ? unless(@ranges){ >>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>> ? } >>>>> >>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>> >>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>> >>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>> >>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>> ? ? ? ?} >>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>> >>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>> $_->[1]+1), >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>> $_->[1]+1) >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>> ? } >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>> in (apart from all the debugging output that I spit out). >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>>> >>>>> 2009/4/24 Dan Bolser : >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I couldn't find out how to get the 'clear range' from a >>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>> >>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>> I take the time to finish that off. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? ?-Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From heikki.lehvaslaiho at gmail.com Mon Apr 27 15:53:12 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 27 Apr 2009 17:53:12 +0200 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: 2009/4/27 Chris Fields : > This is going within Bio::Seq::Quality, correct? Yes. Does Bio::Seq::Quality > have a method that indicates what format the quality scores are actually in > (sanger/illumina/illumina1.3/phred/foo)? ?The reason I worry about this is > quality scores appear inseparable from their quality format (ranges vary in > length, for instance). No method. > For instance, if I picked a Bio::Seq::Quality out of the blue, could I tell > which quality format it originated from w/o guessing, and similarly could I > accurately convert it to another qual format? ?To me it seems we need > something in Bio::Seq::Quality akin to the alphabet() method used for > sequence data. The text formats encode the quality values in different ways, but they are all stored as integer arrays in the object. Converting between them is relatively easy. You are right: quality_format() or even plain format() is needed. The SeqIO methods creating the objects should be setting it. Warnings for unset format values should be added to appropriate places. -Heikki > chris > > On Apr 27, 2009, at 4:38 AM, Heikki Lehvaslaiho wrote: > >> I convinced at least myself to the degree that I wrote the >> range_convert() method - with plenty of tests. I mention this now so >> that no-one else need to start thinking through all the edge values. >> :) >> >> I'll contribute it to the code base once there is a consensus of best >> way forward. >> >> ? -Heikki >> >> 2009/4/27 Heikki Lehvaslaiho : >>>> >>>> I have tried to summarise this in a central place: >>>> http://en.wikipedia.org/wiki/FASTQ_format >>> >>> Torsten, >>> >>> Thanks for putting this together. Very helpful. >>> >>> Do you have a plan of action? ?Let me propose one for BioPerl. It >>> based on following assumptions: >>> >>> 1. There is multitude of different ways of coding quality values out >>> there. >>> 2. Bio::Seq::Quality is agnostic of any quality value range rules >>> 3. The emerging open standard is the Sanger fastq specification >>> 4. Open source programs use the Sanger fastq specs >>> >>> >>> From these it follows that: >>> >>> >>> 1. BioPerl should support Sanger fastq standard >>> >>> 1.1. it already does and there are other SeqIO modules for dealing >>> with other non-fastq formats. >>> >>> 2. BioPerl should offer simple ways of converting between quality range >>> rules >>> >>> 2.1. Have a generic method accessible from Bio::Seq::Quality with >>> preset versions of the method for converting between known variants >>> (Sanger fastq and the two Illumina versions) >>> >>> For example: >>> >>> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value) >>> ?throw if $value < $from_lower or $value > $from_upper >>> ?return $newvalue >>> >>> range_convert_illumina2fastq(), range_convert_fastq2illumina(), >>> range_convert_fastq2phred(), ?range_convert_phred2fastq().... >>> >>> (assuming that illumina 1.3 eq phred) >>> >>> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina >>> qualities into Sanger fastq on the fly >>> >>> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of >>> quality value range either automatically or be given a keyword >>> parameter indicating the range. >>> >>> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects >>> a quality value out of range. >>> >>> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it >>> detects a quality value out of range. >>> >>> 2.2.4. It would be useful but not absolutely necessary for >>> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina >>> ranges >>> >>> >>> What do you think? >>> >>> ? -Heikki >>> >>> 2009/4/26 Torsten Seemann : >>>>>> >>>>>> This might be a good place to ask the question: having looked at the >>>>>> fastq.pm page, is the fastq format defined (only) by a "@'" followed >>>>>> by >>>>> >>>>> a >>>>>> >>>>>> sequence line and a "+" header followed by a quality line and the two >>>>>> headers have to agree? Now that Illumina is using phred scaling, are >>>>>> 'Sanger' and 'Illumina' versions the same? >>>>> >>>>> No they aren't the same, Illumina still encodes the ascii as value + 64 >>>>> and Sanger as value + 33. >>>>> >>>> >>>> Illumina have now CHANGED how they calculate the quality value however >>>> in >>>> the last month or so... Their Q range used to be -5..40 mapped to ASCII >>>> 64+, >>>> but now they produce Q >= 0 and it is unclear if they start at 69 or 64 >>>> now... >>>> >>>> I have tried to summarise this in a central place: >>>> >>>> http://en.wikipedia.org/wiki/FASTQ_format >>>> >>>> Corrections welcome! >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ? -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +27 (0)714328090 >>> Sent from Claremont, WC, South Africa >>> >> >> >> >> -- >> ? -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Mon Apr 27 16:11:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Apr 2009 11:11:12 -0500 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> <2c8757af0904220632m2112ad5do9bf3ad9805a40ec2@mail.gmail.com> <1F0D336E-332B-4EC8-9C4D-0594975FA0A7@hudsonalpha.org> <20090422155815.GA14402@eniac.jgi-psf.org> Message-ID: On Apr 27, 2009, at 10:53 AM, Heikki Lehvaslaiho wrote: > 2009/4/27 Chris Fields : >> This is going within Bio::Seq::Quality, correct? > > Yes. > > Does Bio::Seq::Quality >> have a method that indicates what format the quality scores are >> actually in >> (sanger/illumina/illumina1.3/phred/foo)? The reason I worry about >> this is >> quality scores appear inseparable from their quality format (ranges >> vary in >> length, for instance). > > No method. > >> For instance, if I picked a Bio::Seq::Quality out of the blue, >> could I tell >> which quality format it originated from w/o guessing, and similarly >> could I >> accurately convert it to another qual format? To me it seems we need >> something in Bio::Seq::Quality akin to the alphabet() method used for >> sequence data. > > The text formats encode the quality values in different ways, but they > are all stored as integer arrays in the object. Converting between > them is relatively easy. > > You are right: quality_format() or even plain format() is needed. The > SeqIO methods creating the objects should be setting it. Warnings for > unset format values should be added to appropriate places. > > -Heikki Agreed, and any conversion methods could default to using a set quality_format()/format() for conversions to/from ascii (might serve as a good verification point as well). chris From maj at fortinbras.us Mon Apr 27 15:51:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Apr 2009 11:51:39 -0400 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com><2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com><90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: Dan - congrats on your first contribution! Mark ----- Original Message ----- From: "Dan Bolser" To: "Heikki Lehvaslaiho" Cc: "Chris Fields" ; Sent: Monday, April 27, 2009 4:31 AM Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? Hi Heikki, Thanks very much for the advice on how to better implement the clear range method within the Bio::Seq::Quality object. I can understand the logic of what you have written, and it all sounds reasonable. The only problem is that I am very inexperienced with working on object oriented Perl (my 'one man' projects to date have never really required me to think beyond scripts, and its been years since I actually tried to code objects in Perl). To be specific, when you say, "Lets add a method that sets the threshold and stores it internally as $self->_threshold", ignoring any other functionality, what would that method look like? in particular, how would $self->_threshold be implemented? I think once I see that detail, I can go ahead and try to code what you suggested. Similarly (Chris), where would I put the tests / how would they be implemented? Thanks again for the feedback. All the best, Dan. 2009/4/27 Heikki Lehvaslaiho : > Dan, > > It looks like your method does two different things: > > 1. Returns the longest subsequence above the threshold > 2. Analyses the the sequence for the number of ranges the current > threshold creates. > > Why not separate these functions? > > Lets add a method that sets the threshold and stores it internally as > $self->_threshold. Setting it to a new values should trigger emptying > all the caches (see below.) > > Lets have two more public methods: > > 1. get_clean_range() - optional argument 'threshold' > > It returns the longest clean subseq. > > 2. count_clean_ranges() -again optional argument 'threshold' > > This returns the number of ranges detected. > > Both methods call first the public method threshold if the argument > has been given and then an internal method _find_clean_ranges(). That > method calculates all the ranges and stores them internally (as > $self->_clean_ranges-> [...]). The number of ranges is also stored > (e.g. $self->_number_of ranges).These internal values form the cache > that needs to be emptied whenever any of the critical values of the > object changes: threshold, quality or seq. Create an internal method > $self->_clear_cache, that does that. > > Now the quality new object does not get created until you call > get_clean_range() which accesses the cached values (or creates them if > they are not there). > > This design allows you to have no extra penalty for adding more > methods that act on cached values. For example, it might be sensible > thing to do at some point to look at all the ranges that are longer > than some length. Then you could write in your program: > > > $qual->threshold(10); > if ($qual->count_clean_ranges = 1) { > my $newqual = $qual->get_clean_range() > # do your analysis > } elsif ($qual->count_clean_ranges = 0) { > # do some reporting and logging > } else { # more than one ranges > my @quals = $qual->get_all_clean_ranges($min_lenght); > # do some more work and possibly select the best one(s) > } > > > > Yours, > > -Heikki > > 2009/4/24 Chris Fields : >> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >> possible, tests don't hurt either! >> >> chris >> >> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >> >>> Its a bit rough and ready, but it does what I need... >>> >>> >>> >>> >>> =head2 get_clear_range >>> >>> Title : get_clear_range >>> >>> Title : subqual >>> Usage : $subobj = $obj->get_clear_range(); >>> $subobj = $obj->get_clear_range(20); >>> Function : Get the clear range using the given quality score as a >>> cutoff or a default value of 13. >>> >>> Returns : a new Bio::Seq::Quality object >>> Args : a minimum quality value, optional, devault = 13 >>> >>> =cut >>> >>> sub get_clear_range >>> { >>> my $self = shift; >>> my $qual = $self->qual; >>> my $minQual = shift || 13; >>> >>> my (@ranges, $rangeFlag); >>> >>> for(my $i=0; $i<@$qual; $i++){ >>> ## Are we currently within a clear range or not? >>> if(defined($rangeFlag)){ >>> ## Did we just leave the clear range? >>> if($qual->[$i]<$minQual){ >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> ## and reset the range flag. >>> $rangeFlag = undef; >>> } >>> ## else nothing changes >>> } >>> else{ >>> ## Did we just enter a clear range? >>> if($qual->[$i]>=$minQual){ >>> ## Better set the range flag! >>> $rangeFlag = $i; >>> } >>> ## else nothing changes >>> } >>> } >>> ## Did we exit the last clear range? >>> if(defined($rangeFlag)){ >>> my $i = scalar(@$qual); >>> ## Log the range >>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>> } >>> >>> unless(@ranges){ >>> die "There is no clear range... I don't know what to do here!\n"; >>> } >>> >>> print "there are ", scalar(@ranges), " clear ranges\n"; >>> >>> my $sum; map {$sum += $_->[2]} @ranges; >>> >>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>> "quality scores above the given threshold\n"; >>> >>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>> if($_->[2]/$sum < 0.5){ >>> warn "not so much a clear range as a clear chunk...\n"; >>> } >>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>> >>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>> $_->[1]+1), >>> -qual => $self->subqual($_->[0]+1, >>> $_->[1]+1) >>> ); >>> } >>> } >>> >>> >>> >>> >>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>> That is why the 'new Bio::Seq::Quality object' is actually a >>> Bio::Seq::QualityDB object, but other than that it should slot right >>> in (apart from all the debugging output that I spit out). >>> >>> >>> Cheers, >>> Dan. >>> >>> >>> 2009/4/24 Dan Bolser : >>>> >>>> Hi all, >>>> >>>> I couldn't find out how to get the 'clear range' from a >>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>> this method be a part of the Bio::Seq::Quality class? >>>> >>>> In the latter case I'm on my way to an implementation, but I am not >>>> good at navigating the bioperl docs, so I thought I should ask before >>>> I take the time to finish that off. >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kaboroev at sfu.ca Mon Apr 27 19:04:05 2009 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Mon, 27 Apr 2009 12:04:05 -0700 Subject: [Bioperl-l] Bio::Graphics Sub Feature Title Message-ID: <49F601A5.8090205@sfu.ca> Hi, I was wondering if it is possible to set a different "-title" for each of the subfeatures in a track the same way one can set a different "-bgcolor" using a subroutine. I noticed that the -title subroutine is only called once per Feature and is passed a "Bio::SeqFeature::Generic" class whereas the -bgcolor subroutine is called once per Sub Feature and is passed the "Bio::SeqFeature::Generic"s which I created. Is there any way for the -title subroutine to be called each Sub Feature or is this not implemented? Keith From dan.bolser at gmail.com Tue Apr 28 05:46:05 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 28 Apr 2009 06:46:05 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0904272246q56e19a2dr542b29f2378d0a48@mail.gmail.com> 2009/4/27 Mark A. Jensen : > Dan - congrats on your first contribution! Mark I don't really feel like I can take much credit! Thanks Heikki! I'll look at what you did and see what I can add. Its a really good feeling to contribute to BioPerl (even if I didn't really do much!)... Now... where do I collect my cheque? ;-) Seriously though, thanks all for helping to put this together, and thanks for maintaining BioPerl and keeping it relevant as the field changes. All the best, Dan. > ----- Original Message ----- From: "Dan Bolser" > To: "Heikki Lehvaslaiho" > Cc: "Chris Fields" ; > Sent: Monday, April 27, 2009 4:31 AM > Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality? > > > Hi Heikki, > > Thanks very much for the advice on how to better implement the clear > range method within the Bio::Seq::Quality object. I can understand the > logic of what you have written, and it all sounds reasonable. The only > problem is that I am very inexperienced with working on object > oriented Perl (my 'one man' projects to date have never really > required me to think beyond scripts, and its been years since I > actually tried to code objects in Perl). > > To be specific, when you say, "Lets add a method that sets the > threshold and stores it internally as $self->_threshold", ignoring any > other functionality, what would that method look like? in particular, > how would $self->_threshold be implemented? > > I think once I see that detail, I can go ahead and try to code what > you suggested. > > > Similarly (Chris), where would I put the tests / how would they be > implemented? > > > Thanks again for the feedback. > > All the best, > Dan. > > > > 2009/4/27 Heikki Lehvaslaiho : >> >> Dan, >> >> It looks like your method does two different things: >> >> 1. Returns the longest subsequence above the threshold >> 2. Analyses the the sequence for the number of ranges the current >> threshold creates. >> >> Why not separate these functions? >> >> Lets add a method that sets the threshold and stores it internally as >> $self->_threshold. Setting it to a new values should trigger emptying >> all the caches (see below.) >> >> Lets have two more public methods: >> >> 1. get_clean_range() - optional argument 'threshold' >> >> It returns the longest clean subseq. >> >> 2. count_clean_ranges() -again optional argument 'threshold' >> >> This returns the number of ranges detected. >> >> Both methods call first the public method threshold if the argument >> has been given and then an internal method _find_clean_ranges(). That >> method calculates all the ranges and stores them internally (as >> $self->_clean_ranges-> [...]). The number of ranges is also stored >> (e.g. $self->_number_of ranges).These internal values form the cache >> that needs to be emptied whenever any of the critical values of the >> object changes: threshold, quality or seq. Create an internal method >> $self->_clear_cache, that does that. >> >> Now the quality new object does not get created until you call >> get_clean_range() which accesses the cached values (or creates them if >> they are not there). >> >> This design allows you to have no extra penalty for adding more >> methods that act on cached values. For example, it might be sensible >> thing to do at some point to look at all the ranges that are longer >> than some length. Then you could write in your program: >> >> >> $qual->threshold(10); >> if ($qual->count_clean_ranges = 1) { >> my $newqual = $qual->get_clean_range() >> # do your analysis >> } elsif ($qual->count_clean_ranges = 0) { >> # do some reporting and logging >> } else { # more than one ranges >> my @quals = $qual->get_all_clean_ranges($min_lenght); >> # do some more work and possibly select the best one(s) >> } >> >> >> >> Yours, >> >> -Heikki >> >> 2009/4/24 Chris Fields : >>> >>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If >>> possible, tests don't hurt either! >>> >>> chris >>> >>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>> >>>> Its a bit rough and ready, but it does what I need... >>>> >>>> >>>> >>>> >>>> =head2 get_clear_range >>>> >>>> Title : get_clear_range >>>> >>>> Title : subqual >>>> Usage : $subobj = $obj->get_clear_range(); >>>> $subobj = $obj->get_clear_range(20); >>>> Function : Get the clear range using the given quality score as a >>>> cutoff or a default value of 13. >>>> >>>> Returns : a new Bio::Seq::Quality object >>>> Args : a minimum quality value, optional, devault = 13 >>>> >>>> =cut >>>> >>>> sub get_clear_range >>>> { >>>> my $self = shift; >>>> my $qual = $self->qual; >>>> my $minQual = shift || 13; >>>> >>>> my (@ranges, $rangeFlag); >>>> >>>> for(my $i=0; $i<@$qual; $i++){ >>>> ## Are we currently within a clear range or not? >>>> if(defined($rangeFlag)){ >>>> ## Did we just leave the clear range? >>>> if($qual->[$i]<$minQual){ >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> ## and reset the range flag. >>>> $rangeFlag = undef; >>>> } >>>> ## else nothing changes >>>> } >>>> else{ >>>> ## Did we just enter a clear range? >>>> if($qual->[$i]>=$minQual){ >>>> ## Better set the range flag! >>>> $rangeFlag = $i; >>>> } >>>> ## else nothing changes >>>> } >>>> } >>>> ## Did we exit the last clear range? >>>> if(defined($rangeFlag)){ >>>> my $i = scalar(@$qual); >>>> ## Log the range >>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>> } >>>> >>>> unless(@ranges){ >>>> die "There is no clear range... I don't know what to do here!\n"; >>>> } >>>> >>>> print "there are ", scalar(@ranges), " clear ranges\n"; >>>> >>>> my $sum; map {$sum += $_->[2]} @ranges; >>>> >>>> print "of ", scalar(@$qual), " bases, there are $sum with ". >>>> "quality scores above the given threshold\n"; >>>> >>>> for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>> if($_->[2]/$sum < 0.5){ >>>> warn "not so much a clear range as a clear chunk...\n"; >>>> } >>>> print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>> >>>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1, >>>> $_->[1]+1), >>>> -qual => $self->subqual($_->[0]+1, >>>> $_->[1]+1) >>>> ); >>>> } >>>> } >>>> >>>> >>>> >>>> >>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>> in (apart from all the debugging output that I spit out). >>>> >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> 2009/4/24 Dan Bolser : >>>>> >>>>> Hi all, >>>>> >>>>> I couldn't find out how to get the 'clear range' from a >>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>> this method be a part of the Bio::Seq::Quality class? >>>>> >>>>> In the latter case I'm on my way to an implementation, but I am not >>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>> I take the time to finish that off. >>>>> >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From brianli.cas at gmail.com Wed Apr 29 03:14:23 2009 From: brianli.cas at gmail.com (brian li) Date: Wed, 29 Apr 2009 11:14:23 +0800 Subject: [Bioperl-l] Parse problem of a big EMBL entry Message-ID: Hi everyone, Here is greeting from Brian. I have just began to use bioperl 1.6.0 to collect certain data lines from EMBL files. There's a problem when I try to get an entry that includes over 1 million lines. A call of Bio::SeqIO::embl->next_seq would just cause the parser script to exit. I have read Bio/SeqIO/embl.pm and I think one possible way to solve the problem may be to give my script more memory to store the entry data. The machine I am using has 32GB memory, and that shall be enough for any entry. So I am wondering whether there is any way to set the size of the memory available to a perl script. Others ways to deal with the problem are also welcome. Appreciate your help. Brian From jason at bioperl.org Wed Apr 29 05:10:27 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 28 Apr 2009 22:10:27 -0700 Subject: [Bioperl-l] Parse problem of a big EMBL entry In-Reply-To: References: Message-ID: <2154C145-1A66-4EEB-B99E-FBE8215539F5@bioperl.org> Brian - Without memory leaks it should only take up as much memory as the current sequence you have parsed. If you mean you have a sequence record with > 1M lines I'm not sure how much memory that would take up, depends on if this is lots of feature or what. There are ways to tell BioPerl to throw away things you don't want to parse out from the record. See http://bioperl.org/wiki/HOWTO:SeqIO#Speed. 2C_Bio::Seq::SeqBuilder Perl will use as much memory as is available on your machine. Have you monitored the memory use of the perl running to insure it is reaching the 32Gb limit and that is in fact what is killing the program? -jason On Apr 28, 2009, at 8:14 PM, brian li wrote: > Hi everyone, > > Here is greeting from Brian. > > I have just began to use bioperl 1.6.0 to collect certain data > lines from EMBL files. > > There's a problem when I try to get an entry that includes over 1 > million lines. A call of Bio::SeqIO::embl->next_seq would just cause > the parser script to exit. I have read Bio/SeqIO/embl.pm and I think > one possible way to solve the problem may be to give my script more > memory to store the entry data. The machine I am using has 32GB > memory, and that shall be enough for any entry. > > So I am wondering whether there is any way to set the size of the > memory available to a perl script. Others ways to deal with the > problem are also welcome. > > Appreciate your help. > > Brian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From paola.bisignano at gmail.com Wed Apr 29 14:08:57 2009 From: paola.bisignano at gmail.com (Paola Bisignano) Date: Wed, 29 Apr 2009 16:08:57 +0200 Subject: [Bioperl-l] parsing /www.ebi.ac.uk/pdbsum/ Message-ID: Hi, thanks for accepting me in the mailing list, I'm Paola and I work in the institute of cancer in Genoa, Italy, as a bioinformatic...I'm biologist, quite new in perl...(2 months) and never used bioperl...because I prefer learning a little perl before, but now parsing, parsing, and parsing bioinformatic web sites....I need Bioperl :-) I visited www.bioperl.org and read tutorials, I read about a lot of moduls used to parse different web site. I need to parse one in particular EMBL-EBI http://www.ebi.ac.uk/pdbsum/ that is different from EMBL because there are also other information protein-ligand interaction....I never used bioperl moduls...and parsed by myself...but If the receptor has more ligands...it is more difficult to parse...to choose which ligands I need because there are "false" ligands as ions or glycerol that I don't need but I don't know the synthax of this source...for everything can be seen as a ligand....so I want to know if there are moduls that I can use to do my analysis...if anyone can help me...is very wellcome... Thanks From jason at bioperl.org Wed Apr 29 16:41:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 29 Apr 2009 09:41:02 -0700 Subject: [Bioperl-l] Fwd: Parse problem of a big EMBL entry References: Message-ID: Brian - please always CC the mailing list on replies. Not sure what is causing the seg fault so I can't really help here - if you want to file it as a bug at the bugzilla with instructions on how to reproduce it will hopefully get looked at. -jason Begin forwarded message: > From: brian li > Date: April 29, 2009 1:23:32 AM PDT > To: Jason Stajich > Subject: Re: [Bioperl-l] Parse problem of a big EMBL entry > > Hi Jason, > >> Without memory leaks it should only take up as much memory as the >> current >> sequence you have parsed. If you mean you have a sequence record >> with > 1M >> lines I'm not sure how much memory that would take up, depends on >> if this is >> lots of feature or what. > > Lots of feature. > >> There are ways to tell BioPerl to throw away >> things you don't want to parse out from the record. See >> http://bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > Thanks. I think this would help. > >> Perl will use as much memory as is available on your machine. Have >> you >> monitored the memory use of the perl running to insure it is >> reaching the >> 32Gb limit and that is in fact what is killing the program? > > I monitored the memory usage in my last run. The size of free > memory didn't change a lot, and remained to be around 20GB (buffer > size added). I took the wrong assumption. Thanks again for your hint. > > BTW: The message I get when I parse big million-line entry is > "Segmentation fault". Not familiar with this and trying to get a clue. > > Brian Jason Stajich jason at bioperl.org From razi.khaja at gmail.com Wed Apr 29 19:08:14 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 15:08:14 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> Message-ID: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Hello, I am generating BLAST alignments using the BLAST URL API from NCBI. I want to parse details from BLAST reports whenever there are "Features in/flanking this part of subject sequence".? A portion of the BLAST report showing "Features flanking ..." is pasted below. I am using Bio::SearchIO to parse details.? The relevant part of the script is below. The problem I am having is that for some reason the first occurrence of a "Feature flanking this part of a subject sequence" is skipped. I am only able to parse/print all occurrences of a "Feature in/flanking this part of a subject sequence" from the second occurrence to the last occurrence. I believe the code responsible for parsing this information is in Bio/SearchIO/blast.pm, starting on line 760. I have tried fixing the code in Bio/SearchIO/blast.pm myself but was not able to correct the problem. Would it be possible for someone to fix the code in the Bio/SearchIO/blast.pm module, or help me fix the code so that the first occurrence is not skipped? Thanks, Razi ===== The part of the script that is relevant to parsing "Features in/flanking..." ==== my $bio_searchio_in = Bio::SearchIO->new( ??? -file?? => 'blast_result.txt', ??? -format => 'blast' ); my $i = 1; while( my $result = $bio_searchio_in->next_result() ){ ??? while( my $hit = $result->next_hit() ){ ??????? while( my $hsp = $hit->next_hsp() ){ ??????????? my $hsp_features = $hsp->hit_features(); ??????????? if( $hsp_features ) { ??????????????? print "HSP FEATURE $i\t$hsp_features\n"; ??????????????? $i++; ??????????? } ??????? } ??? } } ===== A portion of a BLAST report with "Features flanking ..." ===== ... ... ?Score = 54.7 bits (29),? Expect = 0.003 ?Identities = 29/29 (100%), Gaps = 0/29 (0%) ?Strand=Plus/Minus Query? 6556???? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6584 ??????????????? ||||||||||||||||||||||||||||| Sbjct? 6953042? CCTGGGTGACAGAGTGAGACTCCATCTCA? 6953014 >gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic contig Length=237250 ?Features flanking this part of subject sequence: ?? 16338 bp at 5' side: PRAME family member 8 ?? 11926 bp at 3' side: PRAME family member 9 ?Score = 7286 bits (3945),? Expect = 0.0 ?Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) ?Strand=Plus/Plus Query? 23225? GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG 23284 ????????????? |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| Sbjct? 86128? GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG 86187 Query? 23285? GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA 23344 ????????????? ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| Sbjct? 86188? GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA 86247 ... ... From cjfields at illinois.edu Wed Apr 29 19:41:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Apr 2009 14:41:54 -0500 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> Message-ID: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> I'm assuming this is from an older bioperl; this data should be accessible via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl 1.6.0 in CPAN). chris On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > Hello, > > I am generating BLAST alignments using the BLAST URL API from NCBI. > > I want to parse details from BLAST reports whenever there are > "Features in/flanking this part of subject sequence". A portion of > the BLAST report showing "Features flanking ..." is pasted below. > > I am using Bio::SearchIO to parse details. The relevant part of the > script is below. > > The problem I am having is that for some reason the first occurrence > of a "Feature flanking this part of a subject sequence" is skipped. > I am only able to parse/print all occurrences of a "Feature > in/flanking this part of a subject sequence" from the second > occurrence to the last occurrence. > > I believe the code responsible for parsing this information is in > Bio/SearchIO/blast.pm, starting on line 760. > I have tried fixing the code in Bio/SearchIO/blast.pm myself but was > not able to correct the problem. > Would it be possible for someone to fix the code in the > Bio/SearchIO/blast.pm module, or help me fix the code so that the > first occurrence is not skipped? > > Thanks, > Razi > ===== The part of the script that is relevant to parsing "Features > in/flanking..." ==== > my $bio_searchio_in = Bio::SearchIO->new( > -file => 'blast_result.txt', > -format => 'blast' > ); > > my $i = 1; > while( my $result = $bio_searchio_in->next_result() ){ > while( my $hit = $result->next_hit() ){ > while( my $hsp = $hit->next_hsp() ){ > my $hsp_features = $hsp->hit_features(); > if( $hsp_features ) { > print "HSP FEATURE $i\t$hsp_features\n"; > $i++; > } > } > } > } > > ===== A portion of a BLAST report with "Features flanking ..." ===== > ... > ... > Score = 54.7 bits (29), Expect = 0.003 > Identities = 29/29 (100%), Gaps = 0/29 (0%) > Strand=Plus/Minus > > Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 > ||||||||||||||||||||||||||||| > Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 > > >> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 >> genomic contig > Length=237250 > > Features flanking this part of subject sequence: > 16338 bp at 5' side: PRAME family member 8 > 11926 bp at 3' side: PRAME family member 9 > > Score = 7286 bits (3945), Expect = 0.0 > Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) > Strand=Plus/Plus > > Query 23225 > GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG > 23284 > |||||||||||||||||||||||||||||||| |||||| ||||||||||| > |||||||| > Sbjct 86128 > GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG > 86187 > > Query 23285 > GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA > 23344 > ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| > ||||| > Sbjct 86188 > GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA > 86247 > ... > ... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjm at berkeleybop.org Wed Apr 29 20:58:15 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Wed, 29 Apr 2009 13:58:15 -0700 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: Message-ID: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> The .ontology files have been deprecated by GO. Use the .obo files instead. It appears the bioperl parser for the .ontology files isn't able to deal with the new relations in GO. I suggest that the bioperl .ontology parser is deprecated too On Apr 22, 2009, at 6:38 AM, Hilmar Lapp wrote: > Hi Carlos, > > I am moving your inquiry to the BioPerl list, as the tool is a part > of Bioperl-db and uses BioPerl for parsing the ontologies. > > In your case, the goflat parser in BioPerl seems to balk at the > second one of the input files. It may be that the input file is > (was?) corrupted, that does happen every once in a while. More > likely though is that the goflat parser hasn't kept up with some > format changes. Have you tried using the obo format version instead? > > -hilmar > > On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: > >> Hi guys >> >> I'm working with biosql and I try to figure out how to load >> ontologies into biosql. >> >> I've tried >> >> load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- >> host localhost --dbname biosql --namespace "Gene Ontology" --format >> goflat --fmtargs "-defs_file,GO.defs" function.ontology >> process.ontology component.ontology >> >> as in the script info but I have an error, >> >> >> ------------------- WARNING --------------------- >> MSG: DBLink exists in the dblink of _default >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: format error (file process.ontology) offending line: >> -negative regulation of angiogenesis ; GO:0016525 ; synonym:down >> regulation of angiogenesis ; synonym:down\-regulation of >> angiogenesis ; synonym:downregulation of angiogenesis ; >> synonym:inhibition of angiogenesis % negative regulation of >> developmental process ; GO:0051093 % regulation of angiogenesis ; >> GO:0045765 >> >> STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ >> perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 >> STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ >> Bio/OntologyIO/dagflat.pm:284 >> STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/perl/ >> 5.10.0/Bio/OntologyIO/dagflat.pm:317 >> STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ >> load_ontology.pl:604 >> ------------------------------------- >> >> Any suggestion? >> >> Cheers, >> >> Carlos >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Apr 29 23:48:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 29 Apr 2009 19:48:10 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com><62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <7A9746282BA343F78423D12DB1578509@NewLife> also check out http://www.bioperl.org/wiki/Parsing_BLAST_HSPs MAJ ----- Original Message ----- From: "Chris Fields" To: "Razi Khaja" Cc: Sent: Wednesday, April 29, 2009 3:41 PM Subject: Re: [Bioperl-l] SearchIO: Features in/flanking this part of asubject sequence > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in bioperl > 1.6.0 in CPAN). > > chris > > On Apr 29, 2009, at 2:08 PM, Razi Khaja wrote: > >> Hello, >> >> I am generating BLAST alignments using the BLAST URL API from NCBI. >> >> I want to parse details from BLAST reports whenever there are >> "Features in/flanking this part of subject sequence". A portion of >> the BLAST report showing "Features flanking ..." is pasted below. >> >> I am using Bio::SearchIO to parse details. The relevant part of the >> script is below. >> >> The problem I am having is that for some reason the first occurrence >> of a "Feature flanking this part of a subject sequence" is skipped. >> I am only able to parse/print all occurrences of a "Feature >> in/flanking this part of a subject sequence" from the second >> occurrence to the last occurrence. >> >> I believe the code responsible for parsing this information is in >> Bio/SearchIO/blast.pm, starting on line 760. >> I have tried fixing the code in Bio/SearchIO/blast.pm myself but was >> not able to correct the problem. >> Would it be possible for someone to fix the code in the >> Bio/SearchIO/blast.pm module, or help me fix the code so that the >> first occurrence is not skipped? >> >> Thanks, >> Razi > > > >> ===== The part of the script that is relevant to parsing "Features >> in/flanking..." ==== >> my $bio_searchio_in = Bio::SearchIO->new( >> -file => 'blast_result.txt', >> -format => 'blast' >> ); >> >> my $i = 1; >> while( my $result = $bio_searchio_in->next_result() ){ >> while( my $hit = $result->next_hit() ){ >> while( my $hsp = $hit->next_hsp() ){ >> my $hsp_features = $hsp->hit_features(); >> if( $hsp_features ) { >> print "HSP FEATURE $i\t$hsp_features\n"; >> $i++; >> } >> } >> } >> } >> >> ===== A portion of a BLAST report with "Features flanking ..." ===== >> ... >> ... >> Score = 54.7 bits (29), Expect = 0.003 >> Identities = 29/29 (100%), Gaps = 0/29 (0%) >> Strand=Plus/Minus >> >> Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584 >> ||||||||||||||||||||||||||||| >> Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014 >> >> >>> gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic >>> contig >> Length=237250 >> >> Features flanking this part of subject sequence: >> 16338 bp at 5' side: PRAME family member 8 >> 11926 bp at 3' side: PRAME family member 9 >> >> Score = 7286 bits (3945), Expect = 0.0 >> Identities = 5437/6145 (88%), Gaps = 152/6145 (2%) >> Strand=Plus/Plus >> >> Query 23225 GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG >> 23284 >> |||||||||||||||||||||||||||||||| |||||| ||||||||||| |||||||| >> Sbjct 86128 GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG >> 86187 >> >> Query 23285 GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA >> 23344 >> ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| ||||| >> Sbjct 86188 GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA >> 86247 >> ... >> ... >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Thu Apr 30 00:31:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Apr 2009 12:31:06 +1200 Subject: [Bioperl-l] waaaay off topic question In-Reply-To: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> References: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C84151@exchsth.agresearch.co.nz> I have a question that's nothing to do with BioPerl or Perl, but hope there's a chance that some of you clever people may be doing the same thing as me :-) I've been asked to write some VB scripts to control Applied Biosystems "Analyst QS" and "BioAnalyst" applications for analyzing mass-spec data. There's limited documentation (10yr out of date) with some example code (that doesn't compile) so I'm not getting as far along as I'd like. Has anyone worked with this stuff before? Any assistance greatly appreciated !!! Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From razi.khaja at gmail.com Thu Apr 30 03:57:17 2009 From: razi.khaja at gmail.com (Razi Khaja) Date: Wed, 29 Apr 2009 23:57:17 -0400 Subject: [Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence In-Reply-To: <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> References: <62e9dabc0904261547k362beaf4x1e7f77e8fe5ca73@mail.gmail.com> <62e9dabc0904291208o7312e838k84dc24350b8e357e@mail.gmail.com> <2396069D-63ED-429C-8166-1B040B12942C@illinois.edu> Message-ID: <62e9dabc0904292057y6b725e0yc3b0a85c661c44f8@mail.gmail.com> Hello Chris, I am using bioperl 1.6.0. It may be a few weeks before I can upgrade to bioperl-live from svn, and so it may be a few weeks before I can return to my question. When I do upgrade, I will report back to this thread if I still encounter problems. Razi On Wed, Apr 29, 2009 at 3:41 PM, Chris Fields wrote: > I'm assuming this is from an older bioperl; this data should be accessible > via $hsp->hit_features in the latest code fromo svn (and I believe in > bioperl 1.6.0 in CPAN). > > chris > > > From jonathanmflowers at gmail.com Thu Apr 30 16:40:42 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:40:42 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem Message-ID: <23319982.post@talk.nabble.com> Dear colleagues, I have set up a mySQL database and loaded a GFF3 and fasta file using Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be working normally except when I attempt to create a Bio::DB::SeqFeature::Segment object. The following works as expected: my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:foo', -user => 'myuser', -pass => 'mypassword', -write => '1'); my @features = $db->features(-seq_id=>'chr1', -start=>1, -end=>10000, -types=>['gene']); However, when I try to create a segment object using either of the two following method calls I get an error: my $segment = $db->segment('chr1',1=>10000); my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => '10000'); -------------------------------- EXCEPTION ------------------------------------ MSG: segment() called in a scalar context but multiple features match. Either call in a list context or narrow your search using the -types or -class arguments STACK Bio::DB::SeqFeature::Store::segment /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 STACK toplevel trial.pl:42 ------------------------------------------------------- Calling in list context (which is not defined in the documentation) produces an array of 22 identical scalars = 'chr1:1..10000'. Any ideas? Thanks Jonathan -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Apr 30 16:52:24 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 09:52:24 -0700 (PDT) Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23264714.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> Message-ID: <23320232.post@talk.nabble.com> Hi, There is no means to do this in bioperl, but it is simple to make a system call and execute an MSA program such as MUSCLE to align fasta-formatted sequences using something like... qx(muscle -in $infilename -out $outfilename) Jonathan laxmanb wrote: > > I need to create a multiple sequence alignment of some sequences using > CLUSTALW or any other Multiple sequence alignment program. However, I've > learnt that this functionality used to be UNIX/Linux only. However, the > documentation is also very old, so I'd like to know if any CLUSTAL/ any > other MSA programs can be run using BioPerl on Windows. > > Thank you for your time :) > -- View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Thu Apr 30 17:04:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Apr 2009 12:04:46 -0500 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <92920FDD-7CB2-4331-9860-87304E16C948@illinois.edu> I don't recall this being a UNIX-only issue, though admittedly it's been years since I've tried running the bioperl-run modules on WinXP. I do recall getting BLAST, EMBOSS and others to work though; I don't see why ClustalW would be much different. Have you actually tested this out and found a problem? Have you tried cygwin? chris On Apr 30, 2009, at 11:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Apr 30 17:29:29 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:29:29 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <23319982.post@talk.nabble.com> References: <23319982.post@talk.nabble.com> Message-ID: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> One would have to see some of your GFF to know better. It sounds like you have chr1 defined in multiple places. Did you use the bp_seqfeature_load script to load the data in one go - it should catch it if you have non-unique IDs. -jason On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > > Dear colleagues, > > I have set up a mySQL database and loaded a GFF3 and fasta file using > Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be > working > normally except when I attempt to create a > Bio::DB::SeqFeature::Segment > object. > > The following works as expected: > > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:foo', > -user => 'myuser', > -pass => 'mypassword', > -write => '1'); > > my @features = $db->features(-seq_id=>'chr1', > -start=>1, > -end=>10000, > -types=>['gene']); > > However, when I try to create a segment object using either of the two > following method calls I get an error: > > my $segment = $db->segment('chr1',1=>10000); > > my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => > '10000'); > > -------------------------------- EXCEPTION > ------------------------------------ > > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types > or > -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 > STACK toplevel trial.pl:42 > ------------------------------------------------------- > > Calling in list context (which is not defined in the documentation) > produces > an array of 22 identical scalars = 'chr1:1..10000'. > > Any ideas? > > Thanks > > Jonathan > > -- > View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Apr 30 17:31:19 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 10:31:19 -0700 Subject: [Bioperl-l] use CLUSTALW on Windows? In-Reply-To: <23320232.post@talk.nabble.com> References: <23264714.post@talk.nabble.com> <23320232.post@talk.nabble.com> Message-ID: <734F5ADF-77F5-4AA5-A676-79B42B3C54CB@bioperl.org> the bioperl-run module of Bio::Tools::Run::Alignment::Clustalw or MUSCLE ones don't work then? They do the cmdline work for you. On Apr 30, 2009, at 9:52 AM, Jon Flowers wrote: > > Hi, > > There is no means to do this in bioperl, but it is simple to make a > system > call and execute an MSA program such as MUSCLE to align fasta- > formatted > sequences using something like... > > qx(muscle -in $infilename -out $outfilename) > > Jonathan > > > laxmanb wrote: >> >> I need to create a multiple sequence alignment of some sequences >> using >> CLUSTALW or any other Multiple sequence alignment program. However, >> I've >> learnt that this functionality used to be UNIX/Linux only. However, >> the >> documentation is also very old, so I'd like to know if any CLUSTAL/ >> any >> other MSA programs can be run using BioPerl on Windows. >> >> Thank you for your time :) >> > > -- > View this message in context: http://www.nabble.com/use-CLUSTALW-on-Windows--tp23264714p23320232.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 19:27:15 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 12:27:15 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion Message-ID: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> So, I'm parsing Genbank sequences to pull out the various exons. I found the way to get the NCBI Exon number from each feature, but am confused about one of the methods. When I do annotation->as_text I'm expecting to get back 1 from the feature, but instead get back Value: 1 ??!? Why is the value from the NCBI file getting that text tagged onto it? http://www.ncbi.nlm.nih.gov/nuccore/73622129 exon 1..774 /gene="BOLA2" /gene_synonym="BOLA2A; My016" /inference="alignment:Splign" /number=1 print ($f->annotation->get_Annotations('number'))[0]->as_text; Value: 1 From SMarkel at accelrys.com Thu Apr 30 19:56:40 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 30 Apr 2009 15:56:40 -0400 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Kevin, I believe the extra text was added for readability when printing to the console. In our code we just add the following post- processing step. (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 30 April 2009 12:27 PM > To: BioPerl List > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > So, I'm parsing Genbank sequences to pull out the various exons. I found > the way to get the NCBI Exon number from each feature, but am confused > about one of the methods. When I do annotation->as_text I'm expecting to > get back 1 from the feature, but instead get back Value: 1 ??!? Why is > the value from the NCBI file getting that text tagged onto it? > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > exon 1..774 > /gene="BOLA2" > /gene_synonym="BOLA2A; My016" > /inference="alignment:Splign" > /number=1 > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > Value: 1 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Thu Apr 30 20:01:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:01:03 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> Message-ID: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> That's nice in some regards, but makes it hard to use the function in code without having to always process the result, which seems to be counter to what one would expect. E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". Is there a better way to get the number directly without having to strip off the text that never existed in the first place? > -----Original Message----- > From: Scott Markel [mailto:SMarkel at accelrys.com] > Sent: Thursday, April 30, 2009 12:57 PM > To: Kevin Brown; BioPerl List > Subject: RE: Bio::Annotations::Collection confusion > > Kevin, > > I believe the extra text was added for readability when printing > to the console. In our code we just add the following post- > processing step. > > (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > Sent: Thursday, 30 April 2009 12:27 PM > > To: BioPerl List > > Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > > > So, I'm parsing Genbank sequences to pull out the various > exons. I found > > the way to get the NCBI Exon number from each feature, but > am confused > > about one of the methods. When I do annotation->as_text I'm > expecting to > > get back 1 from the feature, but instead get back Value: 1 > ??!? Why is > > the value from the NCBI file getting that text tagged onto it? > > > > http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > exon 1..774 > > /gene="BOLA2" > > /gene_synonym="BOLA2A; My016" > > /inference="alignment:Splign" > > /number=1 > > > > print ($f->annotation->get_Annotations('number'))[0]->as_text; > > Value: 1 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jonathanmflowers at gmail.com Thu Apr 30 20:22:23 2009 From: jonathanmflowers at gmail.com (Jon Flowers) Date: Thu, 30 Apr 2009 13:22:23 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> References: <23319982.post@talk.nabble.com> <6AFB36F8-50CD-4DCE-B54F-CF01A483E8FC@bioperl.org> Message-ID: <23322607.post@talk.nabble.com> Jason, I used the Bio::DB::SeqFeature::Store::GFF3Loader rather than the bp_seqfeature_load.pl script. You were right, however. It looks like I had populated the MySQL database with multiple fasta files. I cleared the database, ran the GFF3Loader twice (once for the fasta, once for the GFF3). Segment objects are appear to be working fine now. THANKS! Jonathan Jason Stajich-3 wrote: > > One would have to see some of your GFF to know better. It sounds like > you have chr1 defined in multiple places. > > Did you use the bp_seqfeature_load script to load the data in one go - > it should catch it if you have non-unique IDs. > > -jason > On Apr 30, 2009, at 9:40 AM, Jon Flowers wrote: > >> >> Dear colleagues, >> >> I have set up a mySQL database and loaded a GFF3 and fasta file using >> Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be >> working >> normally except when I attempt to create a >> Bio::DB::SeqFeature::Segment >> object. >> >> The following works as expected: >> >> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >> -dsn => 'dbi:mysql:foo', >> -user => 'myuser', >> -pass => 'mypassword', >> -write => '1'); >> >> my @features = $db->features(-seq_id=>'chr1', >> -start=>1, >> -end=>10000, >> -types=>['gene']); >> >> However, when I try to create a segment object using either of the two >> following method calls I get an error: >> >> my $segment = $db->segment('chr1',1=>10000); >> >> my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => >> '10000'); >> >> -------------------------------- EXCEPTION >> ------------------------------------ >> >> MSG: segment() called in a scalar context but multiple features match. >> Either call in a list context or narrow your search using the -types >> or >> -class arguments >> >> STACK Bio::DB::SeqFeature::Store::segment >> /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 >> STACK toplevel trial.pl:42 >> ------------------------------------------------------- >> >> Calling in list context (which is not defined in the documentation) >> produces >> an array of 22 identical scalars = 'chr1:1..10000'. >> >> Any ideas? >> >> Thanks >> >> Jonathan >> >> -- >> View this message in context: >> http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23322607.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Thu Apr 30 20:24:25 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 30 Apr 2009 13:24:25 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> Message-ID: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Seems like you just want $annotation->value ? =head2 as_text Title : as_text Usage : my $text = $obj->as_text Function: return the string "Value: $v" where $v is the value Returns : string Args : none =cut =head2 display_text Title : display_text Usage : my $str = $ann->display_text(); Function: returns a string. Unlike as_text(), this method returns a string formatted as would be expected for te specific implementation. One can pass a callback as an argument which allows custom text generation; the callback is passed the current instance and any text returned Example : Returns : a string Args : [optional] callback =cut =head2 value Title : value Usage : $obj->value($newval) Function: Get/Set the value for simplevalue Returns : value of value Args : newvalue (optional) =cut On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > That's nice in some regards, but makes it hard to use the function in > code without having to always process the result, which seems to be > counter to what one would expect. > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > Is there a better way to get the number directly without having to > strip > off the text that never existed in the first place? > >> -----Original Message----- >> From: Scott Markel [mailto:SMarkel at accelrys.com] >> Sent: Thursday, April 30, 2009 12:57 PM >> To: Kevin Brown; BioPerl List >> Subject: RE: Bio::Annotations::Collection confusion >> >> Kevin, >> >> I believe the extra text was added for readability when printing >> to the console. In our code we just add the following post- >> processing step. >> >> (my $text = $annotation->as_text()) =~ s/(Comment|Value): //; >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Co-chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >>> Sent: Thursday, 30 April 2009 12:27 PM >>> To: BioPerl List >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion >>> >>> So, I'm parsing Genbank sequences to pull out the various >> exons. I found >>> the way to get the NCBI Exon number from each feature, but >> am confused >>> about one of the methods. When I do annotation->as_text I'm >> expecting to >>> get back 1 from the feature, but instead get back Value: 1 >> ??!? Why is >>> the value from the NCBI file getting that text tagged onto it? >>> >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 >>> exon 1..774 >>> /gene="BOLA2" >>> /gene_synonym="BOLA2A; My016" >>> /inference="alignment:Splign" >>> /number=1 >>> >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; >>> Value: 1 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Apr 30 20:45:29 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 13:45:29 -0700 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> OK. Can't see that method in the Deobfuscator which might explain why I didn't know about it. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A %3AAnnotation%3A%3ACollection > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > Behalf Of Jason Stajich > Sent: Thursday, April 30, 2009 1:24 PM > To: Kevin Brown > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > Seems like you just want $annotation->value ? > > > =head2 as_text > > Title : as_text > Usage : my $text = $obj->as_text > Function: return the string "Value: $v" where $v is the value > Returns : string > Args : none > > > =cut > > =head2 display_text > > Title : display_text > Usage : my $str = $ann->display_text(); > Function: returns a string. Unlike as_text(), this method > returns a > string > formatted as would be expected for te specific > implementation. > > One can pass a callback as an argument which > allows custom > text > generation; the callback is passed the current instance > and any text > returned > Example : > Returns : a string > Args : [optional] callback > > =cut > > =head2 value > > Title : value > Usage : $obj->value($newval) > Function: Get/Set the value for simplevalue > Returns : value of value > Args : newvalue (optional) > > > =cut > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > That's nice in some regards, but makes it hard to use the > function in > > code without having to always process the result, which seems to be > > counter to what one would expect. > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > Is there a better way to get the number directly without having to > > strip > > off the text that never existed in the first place? > > > >> -----Original Message----- > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > >> Sent: Thursday, April 30, 2009 12:57 PM > >> To: Kevin Brown; BioPerl List > >> Subject: RE: Bio::Annotations::Collection confusion > >> > >> Kevin, > >> > >> I believe the extra text was added for readability when printing > >> to the console. In our code we just add the following post- > >> processing step. > >> > >> (my $text = $annotation->as_text()) =~ > s/(Comment|Value): //; > >> > >> Scott > >> > >> Scott Markel, Ph.D. > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > >> San Diego, CA 92121 fax: +1 858 799 5222 > >> USA web: http://www.accelrys.com > >> > >> http://www.linkedin.com/in/smarkel > >> Vice President, Board of Directors: > >> International Society for Computational Biology > >> Co-chair: ISCB Publications Committee > >> Associate Editor: PLoS Computational Biology > >> Editorial Board: Briefings in Bioinformatics > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > >>> Sent: Thursday, 30 April 2009 12:27 PM > >>> To: BioPerl List > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > >>> > >>> So, I'm parsing Genbank sequences to pull out the various > >> exons. I found > >>> the way to get the NCBI Exon number from each feature, but > >> am confused > >>> about one of the methods. When I do annotation->as_text I'm > >> expecting to > >>> get back 1 from the feature, but instead get back Value: 1 > >> ??!? Why is > >>> the value from the NCBI file getting that text tagged onto it? > >>> > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > >>> exon 1..774 > >>> /gene="BOLA2" > >>> /gene_synonym="BOLA2A; My016" > >>> /inference="alignment:Splign" > >>> /number=1 > >>> > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > >>> Value: 1 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > From Russell.Smithies at agresearch.co.nz Thu Apr 30 21:28:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 1 May 2009 09:28:39 +1200 Subject: [Bioperl-l] Bio::Annotations::Collection confusion In-Reply-To: <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F12511@EX02.asurite.ad.asu.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A11AC2C@exch1-hi.accelrys.net> <1A4207F8295607498283FE9E93B775B405F1252E@EX02.asurite.ad.asu.edu> <2CED6499-4196-4F96-BD74-1ACC5569525A@bioperl.org> <1A4207F8295607498283FE9E93B775B405F12548@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493C843A2@exchsth.agresearch.co.nz> It's buried in Bio::Annotation::SimpleValue I think http://bioperl.org/cgi-bin/deob_interface.cgi?Search=&module=&sort_order=by+method&search_string=Bio%3A%3AAnnotation%3A%3ASimpleValue&Filter=Submit+Query > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Friday, 1 May 2009 8:45 a.m. > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > OK. Can't see that method in the Deobfuscator which might explain why I > didn't know about it. > > http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 > A%3AAnnotation%3A%3ACollection&sort_order=by+method&search_string=Bio%3A > %3AAnnotation%3A%3ACollection > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On > > Behalf Of Jason Stajich > > Sent: Thursday, April 30, 2009 1:24 PM > > To: Kevin Brown > > Cc: BioPerl List > > Subject: Re: [Bioperl-l] Bio::Annotations::Collection confusion > > > > Seems like you just want $annotation->value ? > > > > > > =head2 as_text > > > > Title : as_text > > Usage : my $text = $obj->as_text > > Function: return the string "Value: $v" where $v is the value > > Returns : string > > Args : none > > > > > > =cut > > > > =head2 display_text > > > > Title : display_text > > Usage : my $str = $ann->display_text(); > > Function: returns a string. Unlike as_text(), this method > > returns a > > string > > formatted as would be expected for te specific > > implementation. > > > > One can pass a callback as an argument which > > allows custom > > text > > generation; the callback is passed the current instance > > and any text > > returned > > Example : > > Returns : a string > > Args : [optional] callback > > > > =cut > > > > =head2 value > > > > Title : value > > Usage : $obj->value($newval) > > Function: Get/Set the value for simplevalue > > Returns : value of value > > Args : newvalue (optional) > > > > > > =cut > > > > On Apr 30, 2009, at 1:01 PM, Kevin Brown wrote: > > > > > That's nice in some regards, but makes it hard to use the > > function in > > > code without having to always process the result, which seems to be > > > counter to what one would expect. > > > > > > E.g. Bio::Seq->seq returns the sequence, not "Seq: sequence". > > > > > > Is there a better way to get the number directly without having to > > > strip > > > off the text that never existed in the first place? > > > > > >> -----Original Message----- > > >> From: Scott Markel [mailto:SMarkel at accelrys.com] > > >> Sent: Thursday, April 30, 2009 12:57 PM > > >> To: Kevin Brown; BioPerl List > > >> Subject: RE: Bio::Annotations::Collection confusion > > >> > > >> Kevin, > > >> > > >> I believe the extra text was added for readability when printing > > >> to the console. In our code we just add the following post- > > >> processing step. > > >> > > >> (my $text = $annotation->as_text()) =~ > > s/(Comment|Value): //; > > >> > > >> Scott > > >> > > >> Scott Markel, Ph.D. > > >> Principal Bioinformatics Architect email: smarkel at accelrys.com > > >> Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > > >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > > >> San Diego, CA 92121 fax: +1 858 799 5222 > > >> USA web: http://www.accelrys.com > > >> > > >> http://www.linkedin.com/in/smarkel > > >> Vice President, Board of Directors: > > >> International Society for Computational Biology > > >> Co-chair: ISCB Publications Committee > > >> Associate Editor: PLoS Computational Biology > > >> Editorial Board: Briefings in Bioinformatics > > >> > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown > > >>> Sent: Thursday, 30 April 2009 12:27 PM > > >>> To: BioPerl List > > >>> Subject: [Bioperl-l] Bio::Annotations::Collection confusion > > >>> > > >>> So, I'm parsing Genbank sequences to pull out the various > > >> exons. I found > > >>> the way to get the NCBI Exon number from each feature, but > > >> am confused > > >>> about one of the methods. When I do annotation->as_text I'm > > >> expecting to > > >>> get back 1 from the feature, but instead get back Value: 1 > > >> ??!? Why is > > >>> the value from the NCBI file getting that text tagged onto it? > > >>> > > >>> http://www.ncbi.nlm.nih.gov/nuccore/73622129 > > >>> exon 1..774 > > >>> /gene="BOLA2" > > >>> /gene_synonym="BOLA2A; My016" > > >>> /inference="alignment:Splign" > > >>> /number=1 > > >>> > > >>> print ($f->annotation->get_Annotations('number'))[0]->as_text; > > >>> Value: 1 > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason at bioperl.org > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Kevin.M.Brown at asu.edu Thu Apr 30 21:56:16 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Apr 2009 14:56:16 -0700 Subject: [Bioperl-l] Other object oddities Message-ID: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> So, I'm using quite a bit of bioperl code in my own stuff and have been seeing some oddities with the naming of methods. A good example would be in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method called "seq" but in the latter case it returns an object (and expects an object when doing a Set) and in the former it returns a string and expects a string when doing a Set. This makes for a bit of brain freeze on my part when the return from another object might be a Bio::Seq or Bio::SeqFeature::Generic and now calling the ->seq returns different things. Guess I'm just curious if anyone has done an audit of the methods of the various objects and their return types to see how consistent they are across even a subsection of the codebase?