From outaleb at web.de Mon Oct 1 07:46:08 2007 From: outaleb at web.de (issam outaleb) Date: Mon, 01 Oct 2007 13:46:08 +0200 Subject: [Bioperl-l] help about Fasta file??? Message-ID: <2005673060@web.de> hallo all, i have a little problem,: im using this programm, i got some experiment and get some results--> IPI hits,(IPI Accnum) what i want is how can i correlate this IPI ACC Numbers with the FASTA FILE (database fasta),so the programm has to look where is the IPI Accnum in the db and copy this include description and Sequence to a new file; all #!/usr/bin/perl#use warning;#use strict;use CGI qw(:all);open (IN,"C:/Documents and Settings/XXX/Desktop/Search_file") or die "Fehler beim oeffnen";open (FASTA_db,"C:/Documents and Settings/XXX/Desktop/FASTA1.fasta") or die "FASTA nicht m?glich zum ?ffnen!!" ;open (OUT,">C:/Documents and Settings/XXX/Desktop/reslut.txt") or die "Fehler beim Anlegen der neuen Datei";#print "\nDateien zum kopieren geoeffnet\n";while (){ $i = $_; chomp $i; if(/Hit\d">([^<\/A> ]*)/)#match string from htm datei,give me such result-->IOP123234(just IPIs) { #print OUT $1."\n"; #print this IPIs in this file. #what i thought was to push up this IPIs in the array than look at them in the fasta_db file and copy it to new file with the description #and Sequence also. so generate a new fasta file include just my IPIs results. how??? $j = $1; push(@array,$j); } } while (defined($var=)){ $var =~ /(>IPI:)([^| .]*)([^>]*)/ ;# }} close (IN);close (FASTA_db);close (OUT);print "\nDateien geschlossen, Kopiervorgang . Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate kostenlos testen. *http://www.pc-sicherheit.web.de/startseite/?mc=022220* [http://www.pc-sicherheit.web.de/startseite/?mc=022220] From shameer at ncbs.res.in Mon Oct 1 12:57:15 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 1 Oct 2007 22:27:15 +0530 (IST) Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? Message-ID: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> Dear All, Is it possible to draw a phylogeny tree file in PNG format using Bioperl ? My input file are in phylip treefile format. Any Modules / codes in Bio::Graphics / Phylogeny sections ? Input file : ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348:3. 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__5 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From luciap at sas.upenn.edu Mon Oct 1 14:03:00 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Mon, 01 Oct 2007 14:03:00 -0400 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> Message-ID: <1191261780.47013654cb81b@webmail.sas.upenn.edu> I think you'll have better luck using some of already available programs to do that, you'll get better looking trees. If you just have one tree to draw I recommend you use: http://itol.embl.de/ Lucia Quoting Shameer Khadar : > Dear All, > > Is it possible to draw a phylogeny tree file in PNG format using Bioperl ? > My input file are in phylip treefile format. Any Modules / codes in > Bio::Graphics / Phylogeny sections ? > > Input file : > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348:3. > 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__5 > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); > > -- > Shameer Khadar > Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From shameer at ncbs.res.in Mon Oct 1 14:39:05 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 2 Oct 2007 00:09:05 +0530 (IST) Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <1191263834.47013e5a6af93@webmail.sas.upenn.edu> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> <1191261780.47013654cb81b@webmail.sas.upenn.edu> <48581.192.168.1.1.1191262243.squirrel@mail.ncbs.res.in> <1191263834.47013e5a6af93@webmail.sas.upenn.edu> Message-ID: <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> Dear Lucia, Thanks for the mail. Now I got it. I didnt used this TreeIO / Tree::Draw methods. Some how missed this excellent HOWTO : http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as well. I tried that and it worked very nicely. I have to work around to beautify the tree and I am just going to do that. Thanks & Cheers, Shameer > OK > > you can use the implementations in Bio::TreeIO > > you can basically read the tree in newick format and out as an svg graph > something like this: > > my $in = new Bio::TreeIO(-file => 'input', > -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>mytree.svg', > -format => 'svggraph'); > while( my $tree = $in->next_tree ) { > $out->write_tree($tree); > } > > you can also use Bio::Tree::Draw > > hope that helps > > Lucia > > > Quoting Shameer Khadar : > >> Hi, >> >> Thanks for your mail. I have to create these trees as a part of a >> webserver. i need to generate them dynamically using users input >> sequence. >> I think ITOL is not the stuff best suited for my purpose. >> >> > I think you'll have better luck using some of already available >> programs >> > to do >> > that, you'll get better looking trees. If you just have one tree to >> draw I >> > recommend you use: >> > http://itol.embl.de/ >> > >> > Lucia >> > >> > >> > Quoting Shameer Khadar : >> > >> >> Dear All, >> >> >> >> Is it possible to draw a phylogeny tree file in PNG format using >> Bioperl >> >> ? >> >> My input file are in phylip treefile format. Any Modules / codes in >> >> Bio::Graphics / Phylogeny sections ? >> >> >> >> Input file : >> >> >> > >> > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348: >> 3. >> >> >> > >> > 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__ >> 5 >> >> >> > >> > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); >> >> >> >> -- >> Shameer Khadar >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group >> National Centre for Biological Sciences (TIFR) >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India >> T - 91-080-23666001 EXT - 6251 >> W - http://www.ncbs.res.in >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From luciap at sas.upenn.edu Mon Oct 1 14:48:51 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Mon, 01 Oct 2007 14:48:51 -0400 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> <1191261780.47013654cb81b@webmail.sas.upenn.edu> <48581.192.168.1.1.1191262243.squirrel@mail.ncbs.res.in> <1191263834.47013e5a6af93@webmail.sas.upenn.edu> <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> Message-ID: <1191264531.47014113f2b54@webmail.sas.upenn.edu> Yes, that's the issue about those commands, trees are not pretty at all that's why for a one tree only kind of thing I rather use ITOL other thing to try is the tree drawer of the Mesquite package glad I could help Lucia Quoting Shameer Khadar : > Dear Lucia, > > Thanks for the mail. Now I got it. I didnt used this TreeIO / Tree::Draw > methods. Some how missed this excellent HOWTO : > http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as well. I > tried that and it worked very nicely. I have to work around to beautify > the tree and I am just going to do that. > > Thanks & Cheers, > Shameer > > > OK > > > > you can use the implementations in Bio::TreeIO > > > > you can basically read the tree in newick format and out as an svg graph > > something like this: > > > > my $in = new Bio::TreeIO(-file => 'input', > > -format => 'newick'); > > my $out = new Bio::TreeIO(-file => '>mytree.svg', > > -format => 'svggraph'); > > while( my $tree = $in->next_tree ) { > > $out->write_tree($tree); > > } > > > > you can also use Bio::Tree::Draw > > > > hope that helps > > > > Lucia > > > > > > Quoting Shameer Khadar : > > > >> Hi, > >> > >> Thanks for your mail. I have to create these trees as a part of a > >> webserver. i need to generate them dynamically using users input > >> sequence. > >> I think ITOL is not the stuff best suited for my purpose. > >> > >> > I think you'll have better luck using some of already available > >> programs > >> > to do > >> > that, you'll get better looking trees. If you just have one tree to > >> draw I > >> > recommend you use: > >> > http://itol.embl.de/ > >> > > >> > Lucia > >> > > >> > > >> > Quoting Shameer Khadar : > >> > > >> >> Dear All, > >> >> > >> >> Is it possible to draw a phylogeny tree file in PNG format using > >> Bioperl > >> >> ? > >> >> My input file are in phylip treefile format. Any Modules / codes in > >> >> Bio::Graphics / Phylogeny sections ? > >> >> > >> >> Input file : > >> >> > >> > > >> > > > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348: > >> 3. > >> >> > >> > > >> > > > 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__ > >> 5 > >> >> > >> > > >> > > > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); > >> >> > >> > >> -- > >> Shameer Khadar > >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > >> National Centre for Biological Sciences (TIFR) > >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > >> T - 91-080-23666001 EXT - 6251 > >> W - http://www.ncbs.res.in > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > > > > -- > Shameer Khadar > Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason at bioperl.org Mon Oct 1 15:32:38 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 1 Oct 2007 12:32:38 -0700 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <1191264531.47014113f2b54@webmail.sas.upenn.edu> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> <1191261780.47013654cb81b@webmail.sas.upenn.edu> <48581.192.168.1.1.1191262243.squirrel@mail.ncbs.res.in> <1191263834.47013e5a6af93@webmail.sas.upenn.edu> <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> <1191264531.47014113f2b54@webmail.sas.upenn.edu> Message-ID: I'd definitely recommend Bio::Tree::Draw::Cladogram over svggraph for prettier trees - you get postscript out but you can render this to png or jpg with unix tools. If there is a better stand alone tree drawing engine we're happy to try and integrate it into bioperl - the modules here are native Perl only and you can use the bioperl-run modules that wrap DrawTree and DrawGram from EMBOSS to get other PS rendering output. Mesquite, TreeView or other tools are usually much better but not always an option if you want to auto-render these images for a website, etc. -jason On Oct 1, 2007, at 11:48 AM, Lucia Peixoto wrote: > Yes, > that's the issue about those commands, trees are not pretty at all > that's why for a one tree only kind of thing I rather use ITOL > other thing to try is the tree drawer of the Mesquite package > glad I could help > > Lucia > > Quoting Shameer Khadar : > >> Dear Lucia, >> >> Thanks for the mail. Now I got it. I didnt used this TreeIO / >> Tree::Draw >> methods. Some how missed this excellent HOWTO : >> http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as >> well. I >> tried that and it worked very nicely. I have to work around to >> beautify >> the tree and I am just going to do that. >> >> Thanks & Cheers, >> Shameer >> >>> OK >>> >>> you can use the implementations in Bio::TreeIO >>> >>> you can basically read the tree in newick format and out as an >>> svg graph >>> something like this: >>> >>> my $in = new Bio::TreeIO(-file => 'input', >>> -format => 'newick'); >>> my $out = new Bio::TreeIO(-file => '>mytree.svg', >>> -format => 'svggraph'); >>> while( my $tree = $in->next_tree ) { >>> $out->write_tree($tree); >>> } >>> >>> you can also use Bio::Tree::Draw >>> >>> hope that helps >>> >>> Lucia >>> >>> >>> Quoting Shameer Khadar : >>> >>>> Hi, >>>> >>>> Thanks for your mail. I have to create these trees as a part of a >>>> webserver. i need to generate them dynamically using users input >>>> sequence. >>>> I think ITOL is not the stuff best suited for my purpose. >>>> >>>>> I think you'll have better luck using some of already available >>>> programs >>>>> to do >>>>> that, you'll get better looking trees. If you just have one >>>>> tree to >>>> draw I >>>>> recommend you use: >>>>> http://itol.embl.de/ >>>>> >>>>> Lucia >>>>> >>>>> >>>>> Quoting Shameer Khadar : >>>>> >>>>>> Dear All, >>>>>> >>>>>> Is it possible to draw a phylogeny tree file in PNG format using >>>> Bioperl >>>>>> ? >>>>>> My input file are in phylip treefile format. Any Modules / >>>>>> codes in >>>>>> Bio::Graphics / Phylogeny sections ? >>>>>> >>>>>> Input file : >>>>>> >>>>> >>>> >>> >> > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461, > (((((((_E_499_500:2.75000,_E_250_251:2.75000): > 0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381): > 1.51514,_H_255_263:6.20895):0.83877, > (_L_246_249:4.30000,_H_244_245:4.30000):2.74772): > 0.92645,_H_520_534:7.97418):0.15279, > (_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765): > 1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264, > ((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000): > 0.50000,_L_445_447:7.75000):0.50300):1.08808, > (((((_E_381_382:2.65000,_E_377_378:2.65000): > 0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063): > 1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696): > 0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000): > 1.52226,_H_463_464:5.42226):0.94093, > (_E_427_435:5.15000,_E_369_372:5.15000):1.21319): > 1.64489,_L_336_343:8.00808):0.88402, > (((_H_355_365:6.20000,_L_349_354:6.20000):0.91541, > (_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082, > (((((_E_470_474:3.85000,_E_344_348: >>>> 3. >>>>>> >>>>> >>>> >>> >> > 85000):0.89054,_L_475_478:4.74054):1.20107, > (_E_329_335:3.85000,_E_315_317:3.85000):2.09161): > 0.71112,_L_513_519:6.65273):0.67204, > ((_L_296_304:5.00000,_H_292_295:5.00000): > 0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814): > 0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000): > 1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790, > (_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784): > 4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000): > 1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266): > 1.08193,_H_229_243:7.93459):0.18730, > (_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298, > ((((_H_159_171:7.00000,_L_156_158:7.00000): > 0.07448,_L_120_122:7.07448):1.59389, > ((((_L__90__91:2.65000,_E__88__89:2.65000): > 0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061): > 1.74719,_L_147_155:6.52780):2.14057):2.44189, > ((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544, > (_L_172_178:3.95000,_L__77__79:3.95000):1.70544): > 0.42200,_E_138_146:6.07744):0.46209,_E__51__ >>>> 5 >>>>>> >>>>> >>>> >>> >> > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805, > (_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197, > ((((_L_108_109:4.30000,_E_104_107:4.30000): > 1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044, > (_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371, > ((_L__38__39:3.40000,_L__29__37:3.40000):3.64775, > (((((_H___3___6:3.30000,_L___1___2:3.30000): > 0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856): > 0.88488,_E__40__48:5.79344):0.60814, > (_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172): > 1.25080):0.17461):6.94325); >>>>>> >>>> >>>> -- >>>> Shameer Khadar >>>> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group >>>> National Centre for Biological Sciences (TIFR) >>>> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India >>>> T - 91-080-23666001 EXT - 6251 >>>> W - http://www.ncbs.res.in >>>> >>> >>> >>> Lucia Peixoto >>> Department of Biology,SAS >>> University of Pennsylvania >>> >> >> >> -- >> Shameer Khadar >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group >> National Centre for Biological Sciences (TIFR) >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India >> T - 91-080-23666001 EXT - 6251 >> W - http://www.ncbs.res.in >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From outaleb at web.de Mon Oct 1 22:37:26 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 04:37:26 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? Message-ID: <4701AEE6.6070506@web.de> Hi every body, i have some AccNum in a file-> IPI67675 IPI98976. ... what i want is how can i look in the fasta file (db fasta) if there is some match if yes then copy the entire entry into a new fasta file. i tried with bioperl but cause i m noob:-(( i don t get it. thx all From ULNJUJERYDIX at spammotel.com Tue Oct 2 02:21:31 2007 From: ULNJUJERYDIX at spammotel.com (Kevin Lam) Date: Tue, 2 Oct 2007 14:21:31 +0800 Subject: [Bioperl-l] divide and blast blastunsplit blast subsequence Message-ID: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Hi! I am trying to annotate a 200kb sequence by doing blastx to find the protein seq location I need to split the sequence up so that I get the best hits for each region (the top blast hits will mask the smaller proteins if i do it as a whole sequence) if i were to do it manually i can set the subsequence in the web gui for ncbi's blast. this way, the blast hits coords are based on the whole 200kb. but I can't find this option in blast or a straightforward way to do it in bioperl. I found similar solutions like http://www.bio.davidson.edu/projects/DAB/DAB.html divide and blast (but I need to specify coords) there also this from the bioperl archives http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html but isn't there an easier way like i can specify blast subsequence 200-900 of fasta file and it will return the blastx hits in coords in terms of the whole 200kb? From n.haigh at sheffield.ac.uk Tue Oct 2 03:56:57 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 02 Oct 2007 08:56:57 +0100 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <4701AEE6.6070506@web.de> References: <4701AEE6.6070506@web.de> Message-ID: <4701F9C9.4050808@sheffield.ac.uk> outaleb Issame wrote: > Hi every body, > i have some AccNum in a file-> IPI67675 > IPI98976. > ... > > what i want is how can i look in the fasta file (db fasta) if there is > some match > if yes then copy the entire entry into a new fasta file. > i tried with bioperl but cause i m noob:-(( i don t get it. > thx all > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Can you state clearly, what is in the "AccNum" file exactly, some sample text from the actual file would be good. Is the FASTA file containing the sequences in raw FASTA format or has it been processed using somthing like formatdb from the BLAST software? A few more details will help people understand and in turn help you with a swift solution. Cheers Nath From outaleb at web.de Tue Oct 2 05:22:17 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 11:22:17 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <4701F9C9.4050808@sheffield.ac.uk> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> Message-ID: <47020DC9.8040401@web.de> hi, with this file i mean, i picked out this Accession Number from IPI-Human Dbase,they come from a fasta file, so they re under eachother like a i a table in separate file now. what i want is how how can i check it in the fasta File (so in the IPI-Human FAsta File), i they re really there; if yes please copy the entire entry of this Number (>....the sequence also)in new fasta file.so that i get at the end a new FASTA file with jus this IPI Accession Number. thx and hope was clearly. Nathan S. Haigh wrote: >outaleb Issame wrote: > > >>Hi every body, >>i have some AccNum in a file-> IPI67675 >> IPI98976. >> ... >> >>what i want is how can i look in the fasta file (db fasta) if there is >>some match >>if yes then copy the entire entry into a new fasta file. >>i tried with bioperl but cause i m noob:-(( i don t get it. >>thx all >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >Can you state clearly, what is in the "AccNum" file exactly, some sample >text from the actual file would be good. Is the FASTA file containing >the sequences in raw FASTA format or has it been processed using >somthing like formatdb from the BLAST software? > >A few more details will help people understand and in turn help you with >a swift solution. > >Cheers >Nath >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From n.haigh at sheffield.ac.uk Tue Oct 2 05:56:49 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 02 Oct 2007 10:56:49 +0100 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47020DC9.8040401@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> Message-ID: <470215E1.4080901@sheffield.ac.uk> outaleb Issame wrote: > hi, > with this file i mean, i picked out this Accession Number from > IPI-Human Dbase,they come from a fasta file, > so they re under eachother like a i a table in separate file now. > what i want is how how can i check it in the fasta File (so in the > IPI-Human FAsta File), i they re really there; > if yes please copy the entire entry of this Number (>....the sequence > also)in new fasta file.so that i get at the end a new > FASTA file with jus this IPI Accession Number. > thx and hope was clearly. Ok, first of all, I'd read the contents of your Accession numbers into a hash, something like the following (this could be written in a shorter form, but since you're a newbie I'll leave it in a longer form so you can follow easier). -- start script -- use strict; use Bio::SeqIO; # change the following three lines to point to the relevant paths # of your list of accessions file, your fasta file and your output # fasta file my $acc_file = "/path/to/your/file"; my $fasta_file_in = "/path/to/your/fasta/file"; my $fasta_file_out = "/path/to/your/fasta/output/file"; # Use a hash to keep a record of accessions we want to find my %hash_of_req_acc; # read all the required accessions from the file into the hash as keys open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; while () { my $line = $_; chomp $line; $hash_of_req_acc{$_} = 1; } close ACC_FILE; my $seqio_object_in = Bio::SeqIO->new( -file => $fasta_file_in, -format => 'fasta' ); my $seqio_object_out = Bio::SeqIO->new( -file => $fasta_file_out, -format => 'fasta' ); # loop through all the sequences in the fasta file while (my $seq_object = $seqio_object_in->next_seq) { # get the sequence accession for easy matching my $seq_acc = $seq_object->accession_number; # write the sequence object to the output fasta file if we have a matching accession $seqio_object_out->write_seq($seq_object) if exists $hash_of_req_acc{$seq_acc}; } -- end script -- I haven't tested this, but it should at least get you started. Also, the fasta description line in the output file may not be exactly as it was in the input fasta file - if this really matters, you may need to get back to us. Also, if the input fasta file is huge (many thousands of sequences) it may be wise to create an index of the fasta file in order to speed up retrieval. You may find this page helpful: http://www.bioperl.org/wiki/HOWTO:SeqIO Anyway, hope this helps to get you started. Nath From outaleb at web.de Tue Oct 2 06:50:32 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 12:50:32 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <470215E1.4080901@sheffield.ac.uk> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> Message-ID: <47022278.7010700@web.de> thx for the help, but i got a empty output file, i think its problem with matching the acc number, my fasta file look like: *>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein DDHHHU... >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein DDHHHU.. >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein MMMMM..* and my i Accnum File look like: *IPI00177321 IPI00453473 *i hopt it helps to understand.* *. Nathan S. Haigh wrote: >outaleb Issame wrote: > > >>hi, >>with this file i mean, i picked out this Accession Number from >>IPI-Human Dbase,they come from a fasta file, >>so they re under eachother like a i a table in separate file now. >>what i want is how how can i check it in the fasta File (so in the >>IPI-Human FAsta File), i they re really there; >>if yes please copy the entire entry of this Number (>....the sequence >>also)in new fasta file.so that i get at the end a new >>FASTA file with jus this IPI Accession Number. >>thx and hope was clearly. >> >> > >Ok, first of all, I'd read the contents of your Accession numbers into a >hash, something like the following (this could be written in a shorter >form, but since you're a newbie I'll leave it in a longer form so you >can follow easier). > >-- start script -- >use strict; >use Bio::SeqIO; > ># change the following three lines to point to the relevant paths ># of your list of accessions file, your fasta file and your output ># fasta file >my $acc_file = "/path/to/your/file"; >my $fasta_file_in = "/path/to/your/fasta/file"; >my $fasta_file_out = "/path/to/your/fasta/output/file"; > ># Use a hash to keep a record of accessions we want to find >my %hash_of_req_acc; > ># read all the required accessions from the file into the hash as keys >open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >while () { > my $line = $_; > chomp $line; > $hash_of_req_acc{$_} = 1; >} >close ACC_FILE; > >my $seqio_object_in = Bio::SeqIO->new( > -file => $fasta_file_in, > -format => 'fasta' >); >my $seqio_object_out = Bio::SeqIO->new( > -file => $fasta_file_out, > -format => 'fasta' >); > ># loop through all the sequences in the fasta file >while (my $seq_object = $seqio_object_in->next_seq) { > # get the sequence accession for easy matching > my $seq_acc = $seq_object->accession_number; > > # write the sequence object to the output fasta file if we have a >matching accession > $seqio_object_out->write_seq($seq_object) if exists >$hash_of_req_acc{$seq_acc}; >} >-- end script -- > >I haven't tested this, but it should at least get you started. Also, the >fasta description line in the output file may not be exactly as it was >in the input fasta file - if this really matters, you may need to get >back to us. Also, if the input fasta file is huge (many thousands of >sequences) it may be wise to create an index of the fasta file in order >to speed up retrieval. > >You may find this page helpful: >http://www.bioperl.org/wiki/HOWTO:SeqIO > >Anyway, hope this helps to get you started. >Nath > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Tue Oct 2 09:00:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Oct 2007 08:00:57 -0500 Subject: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: There is a script that comes with the bioperl core distribution, bp_split_seq.pl, which does this. Here's the CVS location: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ scripts/seq/?cvsroot=bioperl chris On Oct 2, 2007, at 1:21 AM, Kevin Lam wrote: > Hi! > I am trying to annotate a 200kb sequence by doing blastx to find > the protein > seq location > I need to split the sequence up so that I get the best hits for > each region > (the top blast hits will mask the smaller proteins if i do it as a > whole > sequence) > if i were to do it manually i can set the subsequence in the web > gui for > ncbi's blast. > this way, the blast hits coords are based on the whole 200kb. > > but I can't find this option in blast or a straightforward way to > do it in > bioperl. > > I found similar solutions like > http://www.bio.davidson.edu/projects/DAB/DAB.html > divide and blast (but I need to specify coords) > > there also this from the bioperl archives > http://bioinformatics.org/pipermail/bioclusters/2002-August/ > 000375.html > > but isn't there an easier way like i can specify blast subsequence > 200-900 > of fasta file and it will return the blastx hits in coords in terms > of the > whole 200kb? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rrc22 at cam.ac.uk Tue Oct 2 09:38:45 2007 From: rrc22 at cam.ac.uk (Roy Chaudhuri) Date: Tue, 02 Oct 2007 14:38:45 +0100 Subject: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: <470249E5.6050206@cam.ac.uk> > but isn't there an easier way like i can specify blast subsequence 200-900 > of fasta file and it will return the blastx hits in coords in terms of the > whole 200kb? Once you have split up your sequence (as Chris suggested), and run your BLAST, then you can add the hits to each subsequence as features. The subsequences can then be re-assembled using the cat method from Bio::SeqUtils, which will adjust the coordinates of the features appropriately. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From cjfields at uiuc.edu Tue Oct 2 11:03:29 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Oct 2007 10:03:29 -0500 Subject: [Bioperl-l] exonerate In-Reply-To: <29C4D729-6715-4C19-9872-3B1AF90EAFA3@tll.org.sg> References: <034FB11C-B4E9-4E4E-B213-D4AC6A397B1B@tll.org.sg> <29C4D729-6715-4C19-9872-3B1AF90EAFA3@tll.org.sg> Message-ID: <73B4E193-69D0-409D-9F89-20FB677F45C9@uiuc.edu> One option is to try running $run->cleanup() after you finish parsing, which gets rid of the tempfiles on each run. chris On Sep 30, 2007, at 8:53 PM, alan wrote: > Hi, > > >>> I am calling exonerate.pm within my script while attempting to >>> align cDNA to multiple genomic fragments. After processing about >>> 120+ genomic fragments my code crashes with the following error: >>> >>> ** ERROR **: Could not open [/tmp/tlInatbOED] : Too many open files >>> aborting... >>> MSG: Exonerate call (/usr/local/bin/exonerate /tmp/8X9jQuHUGF / >>> tmp/tlInatbOED > /tmp/EolF5qCNLZ/cIf0HfIRf5) crashed: 34304 >>> STACK Bio::Tools::Run::Alignment::Exonerate::_run /nfs1/alan/ >>> cvs_src/bioperl-run/Bio/Tools/Run/Alignment/Exonerate.pm:214 >>> STACK Bio::Tools::Run::Alignment::Exonerate::run /nfs1/alan/ >>> cvs_src/bioperl-run/Bio/Tools/Run/Alignment/Exonerate.pm:174 >>> >>> The code in Exonerate.pm closes the tmpfile at the end of the >>> routine yet I get the error message about "too many open files". >>> Any suggestions on how I should be closing these files? >>> >>> >>> Extract from my code that runs exonerate is listed below. >>> >>> foreach my $f(@files) { >>> next unless (-f "$dir/$f"); >>> my $q_in = Bio::SeqIO->new(-file=>$query, -format=>"Fasta"); >>> my $query_obj = $q_in->next_seq(); >>> my $target_in = Bio::SeqIO->new(-file=>"$dir/$f", - >>> format=>"Fasta"); >>> my $target_obj = $target_in->next_seq(); >>> my $run = Bio::Tools::Run::Alignment::Exonerate->new(); >>> my $exonerate_io = $run->run($query_obj, $target_obj); >>> >>> [code for parsing the data.......] >>> >>> $exonerate_io->close; #tried this line out of desperation but it >>> did not help :-) >>> } >>> >>> thanks >>> alan From outaleb at web.de Tue Oct 2 10:51:05 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 16:51:05 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47022278.7010700@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> Message-ID: <47025AD9.1090105@web.de> hi again, i think i can resolve this problem with the method : id_parser(); how can i do that? any suggestion .or experience?? ehx again outaleb Issame wrote: >thx for the help, but i got a empty output file, >i think its problem with matching the acc number, my fasta file look like: > >*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein >DDHHHU... > >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein >DDHHHU.. > >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein >MMMMM..* > >and my i Accnum File look like: >*IPI00177321 >IPI00453473 > >*i hopt it helps to understand.* >*. > > >Nathan S. Haigh wrote: > > > >>outaleb Issame wrote: >> >> >> >> >>>hi, >>>with this file i mean, i picked out this Accession Number from >>>IPI-Human Dbase,they come from a fasta file, >>>so they re under eachother like a i a table in separate file now. >>>what i want is how how can i check it in the fasta File (so in the >>>IPI-Human FAsta File), i they re really there; >>>if yes please copy the entire entry of this Number (>....the sequence >>>also)in new fasta file.so that i get at the end a new >>>FASTA file with jus this IPI Accession Number. >>>thx and hope was clearly. >>> >>> >>> >>> >>Ok, first of all, I'd read the contents of your Accession numbers into a >>hash, something like the following (this could be written in a shorter >>form, but since you're a newbie I'll leave it in a longer form so you >>can follow easier). >> >>-- start script -- >>use strict; >>use Bio::SeqIO; >> >># change the following three lines to point to the relevant paths >># of your list of accessions file, your fasta file and your output >># fasta file >>my $acc_file = "/path/to/your/file"; >>my $fasta_file_in = "/path/to/your/fasta/file"; >>my $fasta_file_out = "/path/to/your/fasta/output/file"; >> >># Use a hash to keep a record of accessions we want to find >>my %hash_of_req_acc; >> >># read all the required accessions from the file into the hash as keys >>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >>while () { >> my $line = $_; >> chomp $line; >> $hash_of_req_acc{$_} = 1; >>} >>close ACC_FILE; >> >>my $seqio_object_in = Bio::SeqIO->new( >> -file => $fasta_file_in, >> -format => 'fasta' >>); >>my $seqio_object_out = Bio::SeqIO->new( >> -file => $fasta_file_out, >> -format => 'fasta' >>); >> >># loop through all the sequences in the fasta file >>while (my $seq_object = $seqio_object_in->next_seq) { >> # get the sequence accession for easy matching >> my $seq_acc = $seq_object->accession_number; >> >> # write the sequence object to the output fasta file if we have a >>matching accession >> $seqio_object_out->write_seq($seq_object) if exists >>$hash_of_req_acc{$seq_acc}; >>} >>-- end script -- >> >>I haven't tested this, but it should at least get you started. Also, the >>fasta description line in the output file may not be exactly as it was >>in the input fasta file - if this really matters, you may need to get >>back to us. Also, if the input fasta file is huge (many thousands of >>sequences) it may be wise to create an index of the fasta file in order >>to speed up retrieval. >> >>You may find this page helpful: >>http://www.bioperl.org/wiki/HOWTO:SeqIO >> >>Anyway, hope this helps to get you started. >>Nath >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From rvos at interchange.ubc.ca Tue Oct 2 13:00:36 2007 From: rvos at interchange.ubc.ca (rvos at interchange.ubc.ca) Date: Tue, 02 Oct 2007 10:00:36 -0700 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? Message-ID: <405453b66e.3b66e40545@interchange.ubc.ca> An alternative is to explore the Bio::Phylo treedrawer: http://search.cpan.org/~rvosa/Bio-Phylo-0.17_RC6/lib/Bio/Phylo/Treedrawer.pm This is a separate install (in the interest of full disclosure: I'm the author). Rutger ----- Original Message ----- From: Jason Stajich Date: Monday, October 1, 2007 12:32 pm Subject: Re: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? > I'd definitely recommend Bio::Tree::Draw::Cladogram over svggraph > for prettier trees - you get postscript out but you can render this > > to png or jpg with unix tools. If there is a better stand alone > tree > drawing engine we're happy to try and integrate it into bioperl - > the > modules here are native Perl only and you can use the bioperl-run > modules that wrap DrawTree and DrawGram from EMBOSS to get other PS > > rendering output. > > Mesquite, TreeView or other tools are usually much better but not > always an option if you want to auto-render these images for a > website, etc. > > -jason > > > On Oct 1, 2007, at 11:48 AM, Lucia Peixoto wrote: > > > Yes, > > that's the issue about those commands, trees are not pretty at all > > that's why for a one tree only kind of thing I rather use ITOL > > other thing to try is the tree drawer of the Mesquite package > > glad I could help > > > > Lucia > > > > Quoting Shameer Khadar : > > > >> Dear Lucia, > >> > >> Thanks for the mail. Now I got it. I didnt used this TreeIO / > >> Tree::Draw > >> methods. Some how missed this excellent HOWTO : > >> http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as > > >> well. I > >> tried that and it worked very nicely. I have to work around to > >> beautify > >> the tree and I am just going to do that. > >> > >> Thanks & Cheers, > >> Shameer > >> > >>> OK > >>> > >>> you can use the implementations in Bio::TreeIO > >>> > >>> you can basically read the tree in newick format and out as an > >>> svg graph > >>> something like this: > >>> > >>> my $in = new Bio::TreeIO(-file => 'input', > >>> -format => 'newick'); > >>> my $out = new Bio::TreeIO(-file => '>mytree.svg', > >>> -format => 'svggraph'); > >>> while( my $tree = $in->next_tree ) { > >>> $out->write_tree($tree); > >>> } > >>> > >>> you can also use Bio::Tree::Draw > >>> > >>> hope that helps > >>> > >>> Lucia > >>> > >>> > >>> Quoting Shameer Khadar : > >>> > >>>> Hi, > >>>> > >>>> Thanks for your mail. I have to create these trees as a part > of a > >>>> webserver. i need to generate them dynamically using users input > >>>> sequence. > >>>> I think ITOL is not the stuff best suited for my purpose. > >>>> > >>>>> I think you'll have better luck using some of already available > >>>> programs > >>>>> to do > >>>>> that, you'll get better looking trees. If you just have one > >>>>> tree to > >>>> draw I > >>>>> recommend you use: > >>>>> http://itol.embl.de/ > >>>>> > >>>>> Lucia > >>>>> > >>>>> > >>>>> Quoting Shameer Khadar : > >>>>> > >>>>>> Dear All, > >>>>>> > >>>>>> Is it possible to draw a phylogeny tree file in PNG format > using>>>> Bioperl > >>>>>> ? > >>>>>> My input file are in phylip treefile format. Any Modules / > > >>>>>> codes in > >>>>>> Bio::Graphics / Phylogeny sections ? > >>>>>> > >>>>>> Input file : > >>>>>> > >>>>> > >>>> > >>> > >> > > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461, > > (((((((_E_499_500:2.75000,_E_250_251:2.75000): > > 0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381): > > 1.51514,_H_255_263:6.20895):0.83877, > > (_L_246_249:4.30000,_H_244_245:4.30000):2.74772): > > 0.92645,_H_520_534:7.97418):0.15279, > > (_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765): > > 1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264, > > ((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000): > > 0.50000,_L_445_447:7.75000):0.50300):1.08808, > > (((((_E_381_382:2.65000,_E_377_378:2.65000): > > 0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063): > > 1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696): > > 0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000): > > 1.52226,_H_463_464:5.42226):0.94093, > > (_E_427_435:5.15000,_E_369_372:5.15000):1.21319): > > 1.64489,_L_336_343:8.00808):0.88402, > > (((_H_355_365:6.20000,_L_349_354:6.20000):0.91541, > > (_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082, > > (((((_E_470_474:3.85000,_E_344_348: > >>>> 3. > >>>>>> > >>>>> > >>>> > >>> > >> > > 85000):0.89054,_L_475_478:4.74054):1.20107, > > (_E_329_335:3.85000,_E_315_317:3.85000):2.09161): > > 0.71112,_L_513_519:6.65273):0.67204, > > ((_L_296_304:5.00000,_H_292_295:5.00000): > > 0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814): > > 0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000): > > 1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790, > > > (_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784): > > 4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000): > > 1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266): > > 1.08193,_H_229_243:7.93459):0.18730, > > (_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298, > > ((((_H_159_171:7.00000,_L_156_158:7.00000): > > 0.07448,_L_120_122:7.07448):1.59389, > > ((((_L__90__91:2.65000,_E__88__89:2.65000): > > 0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061): > > 1.74719,_L_147_155:6.52780):2.14057):2.44189, > > ((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544, > > (_L_172_178:3.95000,_L__77__79:3.95000):1.70544): > > 0.42200,_E_138_146:6.07744):0.46209,_E__51__ > >>>> 5 > >>>>>> > >>>>> > >>>> > >>> > >> > > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805, > > (_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197, > > ((((_L_108_109:4.30000,_E_104_107:4.30000): > > 1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044, > > > (_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371, > > ((_L__38__39:3.40000,_L__29__37:3.40000):3.64775, > > (((((_H___3___6:3.30000,_L___1___2:3.30000): > > 0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856): > > 0.88488,_E__40__48:5.79344):0.60814, > > > (_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172): > > 1.25080):0.17461):6.94325); > >>>>>> > >>>> > >>>> -- > >>>> Shameer Khadar > >>>> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > >>>> National Centre for Biological Sciences (TIFR) > >>>> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > >>>> T - 91-080-23666001 EXT - 6251 > >>>> W - http://www.ncbs.res.in > >>>> > >>> > >>> > >>> Lucia Peixoto > >>> Department of Biology,SAS > >>> University of Pennsylvania > >>> > >> > >> > >> -- > >> Shameer Khadar > >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > >> National Centre for Biological Sciences (TIFR) > >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > >> T - 91-080-23666001 EXT - 6251 > >> W - http://www.ncbs.res.in > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Tue Oct 2 17:34:20 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Oct 2007 10:34:20 +1300 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47025AD9.1090105@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk><47022278.7010700@web.de> <47025AD9.1090105@web.de> Message-ID: I know this is the Bioperl list but how about just doing it with grep? grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of outaleb Issame > Sent: Wednesday, 3 October 2007 3:51 a.m. > To: outaleb Issame > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta? > > hi again, > i think i can resolve this problem with the method : id_parser(); > how can i do that? > any suggestion .or experience?? > ehx again > > > > outaleb Issame wrote: > > >thx for the help, but i got a empty output file, > >i think its problem with matching the acc number, my fasta file look like: > > > >*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > protein > >DDHHHU... > > >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > protein > >DDHHHU.. > > >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > protein > >MMMMM..* > > > >and my i Accnum File look like: > >*IPI00177321 > >IPI00453473 > > > >*i hopt it helps to understand.* > >*. > > > > > >Nathan S. Haigh wrote: > > > > > > > >>outaleb Issame wrote: > >> > >> > >> > >> > >>>hi, > >>>with this file i mean, i picked out this Accession Number from > >>>IPI-Human Dbase,they come from a fasta file, > >>>so they re under eachother like a i a table in separate file now. > >>>what i want is how how can i check it in the fasta File (so in the > >>>IPI-Human FAsta File), i they re really there; > >>>if yes please copy the entire entry of this Number (>....the sequence > >>>also)in new fasta file.so that i get at the end a new > >>>FASTA file with jus this IPI Accession Number. > >>>thx and hope was clearly. > >>> > >>> > >>> > >>> > >>Ok, first of all, I'd read the contents of your Accession numbers into a > >>hash, something like the following (this could be written in a shorter > >>form, but since you're a newbie I'll leave it in a longer form so you > >>can follow easier). > >> > >>-- start script -- > >>use strict; > >>use Bio::SeqIO; > >> > >># change the following three lines to point to the relevant paths > >># of your list of accessions file, your fasta file and your output > >># fasta file > >>my $acc_file = "/path/to/your/file"; > >>my $fasta_file_in = "/path/to/your/fasta/file"; > >>my $fasta_file_out = "/path/to/your/fasta/output/file"; > >> > >># Use a hash to keep a record of accessions we want to find > >>my %hash_of_req_acc; > >> > >># read all the required accessions from the file into the hash as keys > >>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; > >>while () { > >> my $line = $_; > >> chomp $line; > >> $hash_of_req_acc{$_} = 1; > >>} > >>close ACC_FILE; > >> > >>my $seqio_object_in = Bio::SeqIO->new( > >> -file => $fasta_file_in, > >> -format => 'fasta' > >>); > >>my $seqio_object_out = Bio::SeqIO->new( > >> -file => $fasta_file_out, > >> -format => 'fasta' > >>); > >> > >># loop through all the sequences in the fasta file > >>while (my $seq_object = $seqio_object_in->next_seq) { > >> # get the sequence accession for easy matching > >> my $seq_acc = $seq_object->accession_number; > >> > >> # write the sequence object to the output fasta file if we have a > >>matching accession > >> $seqio_object_out->write_seq($seq_object) if exists > >>$hash_of_req_acc{$seq_acc}; > >>} > >>-- end script -- > >> > >>I haven't tested this, but it should at least get you started. Also, the > >>fasta description line in the output file may not be exactly as it was > >>in the input fasta file - if this really matters, you may need to get > >>back to us. Also, if the input fasta file is huge (many thousands of > >>sequences) it may be wise to create an index of the fasta file in order > >>to speed up retrieval. > >> > >>You may find this page helpful: > >>http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >>Anyway, hope this helps to get you started. > >>Nath > >> > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> > >> > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From thiago.venancio at gmail.com Tue Oct 2 17:41:06 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Tue, 2 Oct 2007 18:41:06 -0300 Subject: [Bioperl-l] frac_* methods Message-ID: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> Hi all, This topic was discussed before, but I would like to put it on the list again, maybe someone has an update. The methods frac_identical, frac_conserved, frac_aligned_query and frac_aligned_hit can also be used in the hit context, after HSP tilling. In my point of view, it is better to use it just in HSPs individually, because there are some rare/strange kinds of alignments. However, we frequently need to get one measure of the whole alignment. Any of the BioPerl masters has an update on this topic ? What is the best current usage ? Best. Thiago -- "Innovation distinguishes between a leader and a follower." Steve Jobs ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From outaleb at web.de Tue Oct 2 17:47:07 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 23:47:07 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk><47022278.7010700@web.de> <47025AD9.1090105@web.de> Message-ID: <4702BC5B.7040407@web.de> thx for this, but i want just create new fasta file with my accNumbers which i search in the FASTA file(localdbase). so --> just search this Numbers in the FASTA file, if yes then copy the Header and Sequence to other new fasta file . i m sitting in this 2 days now; i dont think it s difficult but howww????? i get crazy guys. common some expert in this area?? Smithies, Russell wrote: >I know this is the Bioperl list but how about just doing it with grep? > > grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta > > > > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >> >> >[mailto:bioperl-l-bounces at lists.open- > > >>bio.org] On Behalf Of outaleb Issame >>Sent: Wednesday, 3 October 2007 3:51 a.m. >>To: outaleb Issame >>Cc: bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta? >> >>hi again, >>i think i can resolve this problem with the method : id_parser(); >>how can i do that? >>any suggestion .or experience?? >>ehx again >> >> >> >>outaleb Issame wrote: >> >> >> >>>thx for the help, but i got a empty output file, >>>i think its problem with matching the acc number, my fasta file look >>> >>> >like: > > >>>*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>> >>> >>protein >> >> >>>DDHHHU... >>> >>> >>>>IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>>> >>>> >>protein >> >> >>>DDHHHU.. >>> >>> >>>>IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>>> >>>> >>protein >> >> >>>MMMMM..* >>> >>>and my i Accnum File look like: >>>*IPI00177321 >>>IPI00453473 >>> >>>*i hopt it helps to understand.* >>>*. >>> >>> >>>Nathan S. Haigh wrote: >>> >>> >>> >>> >>> >>>>outaleb Issame wrote: >>>> >>>> >>>> >>>> >>>> >>>> >>>>>hi, >>>>>with this file i mean, i picked out this Accession Number from >>>>>IPI-Human Dbase,they come from a fasta file, >>>>>so they re under eachother like a i a table in separate file now. >>>>>what i want is how how can i check it in the fasta File (so in the >>>>>IPI-Human FAsta File), i they re really there; >>>>>if yes please copy the entire entry of this Number (>....the >>>>> >>>>> >sequence > > >>>>>also)in new fasta file.so that i get at the end a new >>>>>FASTA file with jus this IPI Accession Number. >>>>>thx and hope was clearly. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>Ok, first of all, I'd read the contents of your Accession numbers >>>> >>>> >into a > > >>>>hash, something like the following (this could be written in a >>>> >>>> >shorter > > >>>>form, but since you're a newbie I'll leave it in a longer form so >>>> >>>> >you > > >>>>can follow easier). >>>> >>>>-- start script -- >>>>use strict; >>>>use Bio::SeqIO; >>>> >>>># change the following three lines to point to the relevant paths >>>># of your list of accessions file, your fasta file and your output >>>># fasta file >>>>my $acc_file = "/path/to/your/file"; >>>>my $fasta_file_in = "/path/to/your/fasta/file"; >>>>my $fasta_file_out = "/path/to/your/fasta/output/file"; >>>> >>>># Use a hash to keep a record of accessions we want to find >>>>my %hash_of_req_acc; >>>> >>>># read all the required accessions from the file into the hash as >>>> >>>> >keys > > >>>>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >>>>while () { >>>>my $line = $_; >>>>chomp $line; >>>>$hash_of_req_acc{$_} = 1; >>>>} >>>>close ACC_FILE; >>>> >>>>my $seqio_object_in = Bio::SeqIO->new( >>>>-file => $fasta_file_in, >>>>-format => 'fasta' >>>>); >>>>my $seqio_object_out = Bio::SeqIO->new( >>>>-file => $fasta_file_out, >>>>-format => 'fasta' >>>>); >>>> >>>># loop through all the sequences in the fasta file >>>>while (my $seq_object = $seqio_object_in->next_seq) { >>>># get the sequence accession for easy matching >>>>my $seq_acc = $seq_object->accession_number; >>>> >>>># write the sequence object to the output fasta file if we have a >>>>matching accession >>>>$seqio_object_out->write_seq($seq_object) if exists >>>>$hash_of_req_acc{$seq_acc}; >>>>} >>>>-- end script -- >>>> >>>>I haven't tested this, but it should at least get you started. Also, >>>> >>>> >the > > >>>>fasta description line in the output file may not be exactly as it >>>> >>>> >was > > >>>>in the input fasta file - if this really matters, you may need to >>>> >>>> >get > > >>>>back to us. Also, if the input fasta file is huge (many thousands of >>>>sequences) it may be wise to create an index of the fasta file in >>>> >>>> >order > > >>>>to speed up retrieval. >>>> >>>>You may find this page helpful: >>>>http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>>Anyway, hope this helps to get you started. >>>>Nath >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >======================================================================= >Attention: The information contained in this message and/or attachments >from AgResearch Limited is intended only for the persons or entities >to which it is addressed and may contain confidential and/or privileged >material. Any review, retransmission, dissemination or other use of, or >taking of any action in reliance upon, this information by persons or >entities other than the intended recipients is prohibited by AgResearch >Limited. If you have received this message in error, please notify the >sender immediately. >======================================================================= > > > From jason at bioperl.org Tue Oct 2 18:22:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 2 Oct 2007 15:22:59 -0700 Subject: [Bioperl-l] frac_* methods In-Reply-To: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> References: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> Message-ID: <3DC00A97-EF7E-41B4-854F-B088715AB901@bioperl.org> I think my answer before was something to the tune of: Use an alignment algorithm that finds a single best alignment like FASTA or Smith-Waterman (SW) if what you want is a single number that represents the alignment. BLAST is great for fast searching but FASTA or SW/SSEARCH are going to be better at creating an alignment. Consider the -postsw option in WUBLAST as well as it will realign the HSPs with SW. I personally never use the frac alignment summary stats for the Hit object for this reason unless I know I am going to have a single HSP. -jason On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > Hi all, > > This topic was discussed before, but I would like to put it on the > list > again, maybe someone has an update. > > The methods frac_identical, frac_conserved, frac_aligned_query and > frac_aligned_hit can also be used in the hit context, after HSP > tilling. In > my point of view, it is better to use it just in HSPs individually, > because > there are some rare/strange kinds of alignments. However, we > frequently need > to get one measure of the whole alignment. > > Any of the BioPerl masters has an update on this topic ? What is > the best > current usage ? > > Best. > > Thiago > > -- > "Innovation distinguishes between a leader and a follower." > Steve Jobs > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Tue Oct 2 18:32:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Oct 2007 17:32:30 -0500 Subject: [Bioperl-l] frac_* methods In-Reply-To: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> References: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> Message-ID: I think their use is based on what you are trying to accomplish. For instance I am currently running a lot of small BLASTN queries (limiting by normalized bit score), so I tend to look at the HSP data more. However, in other circumstances I might want the overall frac_identical for all HSPs ($hit->frac_identical). YMMV. chris On Oct 2, 2007, at 4:41 PM, Thiago Venancio wrote: > Hi all, > > This topic was discussed before, but I would like to put it on the > list > again, maybe someone has an update. > > The methods frac_identical, frac_conserved, frac_aligned_query and > frac_aligned_hit can also be used in the hit context, after HSP > tilling. In > my point of view, it is better to use it just in HSPs individually, > because > there are some rare/strange kinds of alignments. However, we > frequently need > to get one measure of the whole alignment. > > Any of the BioPerl masters has an update on this topic ? What is > the best > current usage ? > > Best. > > Thiago From razi.khaja at gmail.com Tue Oct 2 19:46:12 2007 From: razi.khaja at gmail.com (Razi Khaja) Date: Tue, 2 Oct 2007 19:46:12 -0400 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <4702BC5B.7040407@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> Message-ID: <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Here is the easiest non-bioperl solution using executables provided with ncbi's blast: (1) format your multifasta file into a blast database > /usr/local/ncbi/blast-2.2.16/bin/formatdb -i yourmultifastafile -t yourblastdb (2) extract sequences from the newly created blast database with a file containing a list of accession numbers (one on each line) > /usr/local/ncib/blast-2.2.16/bin/fastacmd -d yourblastdb -i inputfilewithaccessionnumbers -o outputfile Your outputfile should be a multifasta file of your list of accession numbers blast executables are available from http://www.ncbi.nlm.nih.gov/blast/download.shtml Hope that helps. Razi Khaja On 10/2/07, outaleb Issame wrote: > thx for this, but i want just create new fasta file with my accNumbers > which i search in the FASTA file(localdbase). > so --> just search this Numbers in the FASTA file, if yes then copy the > Header and Sequence to other new fasta file . > i m sitting in this 2 days now; i dont think it s difficult but howww????? > i get crazy guys. > common some expert in this area?? > > > > Smithies, Russell wrote: > > >I know this is the Bioperl list but how about just doing it with grep? > > > > grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta > > > > > > > > > > > >>-----Original Message----- > >>From: bioperl-l-bounces at lists.open-bio.org > >> > >> > >[mailto:bioperl-l-bounces at lists.open- > > > > > >>bio.org] On Behalf Of outaleb Issame > >>Sent: Wednesday, 3 October 2007 3:51 a.m. > >>To: outaleb Issame > >>Cc: bioperl-l at lists.open-bio.org > >>Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta? > >> > >>hi again, > >>i think i can resolve this problem with the method : id_parser(); > >>how can i do that? > >>any suggestion .or experience?? > >>ehx again > >> > >> > >> > >>outaleb Issame wrote: > >> > >> > >> > >>>thx for the help, but i got a empty output file, > >>>i think its problem with matching the acc number, my fasta file look > >>> > >>> > >like: > > > > > >>>*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > >>> > >>> > >>protein > >> > >> > >>>DDHHHU... > >>> > >>> > >>>>IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > >>>> > >>>> > >>protein > >> > >> > >>>DDHHHU.. > >>> > >>> > >>>>IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > >>>> > >>>> > >>protein > >> > >> > >>>MMMMM..* > >>> > >>>and my i Accnum File look like: > >>>*IPI00177321 > >>>IPI00453473 > >>> > >>>*i hopt it helps to understand.* > >>>*. > >>> > >>> > >>>Nathan S. Haigh wrote: > >>> > >>> > >>> > >>> > >>> > >>>>outaleb Issame wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>hi, > >>>>>with this file i mean, i picked out this Accession Number from > >>>>>IPI-Human Dbase,they come from a fasta file, > >>>>>so they re under eachother like a i a table in separate file now. > >>>>>what i want is how how can i check it in the fasta File (so in the > >>>>>IPI-Human FAsta File), i they re really there; > >>>>>if yes please copy the entire entry of this Number (>....the > >>>>> > >>>>> > >sequence > > > > > >>>>>also)in new fasta file.so that i get at the end a new > >>>>>FASTA file with jus this IPI Accession Number. > >>>>>thx and hope was clearly. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>Ok, first of all, I'd read the contents of your Accession numbers > >>>> > >>>> > >into a > > > > > >>>>hash, something like the following (this could be written in a > >>>> > >>>> > >shorter > > > > > >>>>form, but since you're a newbie I'll leave it in a longer form so > >>>> > >>>> > >you > > > > > >>>>can follow easier). > >>>> > >>>>-- start script -- > >>>>use strict; > >>>>use Bio::SeqIO; > >>>> > >>>># change the following three lines to point to the relevant paths > >>>># of your list of accessions file, your fasta file and your output > >>>># fasta file > >>>>my $acc_file = "/path/to/your/file"; > >>>>my $fasta_file_in = "/path/to/your/fasta/file"; > >>>>my $fasta_file_out = "/path/to/your/fasta/output/file"; > >>>> > >>>># Use a hash to keep a record of accessions we want to find > >>>>my %hash_of_req_acc; > >>>> > >>>># read all the required accessions from the file into the hash as > >>>> > >>>> > >keys > > > > > >>>>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; > >>>>while () { > >>>>my $line = $_; > >>>>chomp $line; > >>>>$hash_of_req_acc{$_} = 1; > >>>>} > >>>>close ACC_FILE; > >>>> > >>>>my $seqio_object_in = Bio::SeqIO->new( > >>>>-file => $fasta_file_in, > >>>>-format => 'fasta' > >>>>); > >>>>my $seqio_object_out = Bio::SeqIO->new( > >>>>-file => $fasta_file_out, > >>>>-format => 'fasta' > >>>>); > >>>> > >>>># loop through all the sequences in the fasta file > >>>>while (my $seq_object = $seqio_object_in->next_seq) { > >>>># get the sequence accession for easy matching > >>>>my $seq_acc = $seq_object->accession_number; > >>>> > >>>># write the sequence object to the output fasta file if we have a > >>>>matching accession > >>>>$seqio_object_out->write_seq($seq_object) if exists > >>>>$hash_of_req_acc{$seq_acc}; > >>>>} > >>>>-- end script -- > >>>> > >>>>I haven't tested this, but it should at least get you started. Also, > >>>> > >>>> > >the > > > > > >>>>fasta description line in the output file may not be exactly as it > >>>> > >>>> > >was > > > > > >>>>in the input fasta file - if this really matters, you may need to > >>>> > >>>> > >get > > > > > >>>>back to us. Also, if the input fasta file is huge (many thousands of > >>>>sequences) it may be wise to create an index of the fasta file in > >>>> > >>>> > >order > > > > > >>>>to speed up retrieval. > >>>> > >>>>You may find this page helpful: > >>>>http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>>Anyway, hope this helps to get you started. > >>>>Nath > >>>> > >>>> > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l at lists.open-bio.org > >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l at lists.open-bio.org > >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > >>> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >======================================================================= > >Attention: The information contained in this message and/or attachments > >from AgResearch Limited is intended only for the persons or entities > >to which it is addressed and may contain confidential and/or privileged > >material. Any review, retransmission, dissemination or other use of, or > >taking of any action in reliance upon, this information by persons or > >entities other than the intended recipients is prohibited by AgResearch > >Limited. If you have received this message in error, please notify the > >sender immediately. > >======================================================================= > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Tue Oct 2 20:50:37 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 2 Oct 2007 17:50:37 -0700 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47025AD9.1090105@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> Message-ID: http://bioperl.open-bio.org/wiki/ FAQ#How_do_I_use_Bio::Index::Fasta_and_index_on_different_ids.3F On Oct 2, 2007, at 7:51 AM, outaleb Issame wrote: > hi again, > i think i can resolve this problem with the method : id_parser(); > how can i do that? > any suggestion .or experience?? > ehx again > > > > outaleb Issame wrote: > >> thx for the help, but i got a empty output file, >> i think its problem with matching the acc number, my fasta file >> look like: >> >> *>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to >> NOD3 protein >> DDHHHU... >>> IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>> protein >> DDHHHU.. >>> IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>> protein >> MMMMM..* >> >> and my i Accnum File look like: >> *IPI00177321 >> IPI00453473 >> >> *i hopt it helps to understand.* >> *. >> >> >> Nathan S. Haigh wrote: >> >> >> >>> outaleb Issame wrote: >>> >>> >>> >>> >>>> hi, >>>> with this file i mean, i picked out this Accession Number from >>>> IPI-Human Dbase,they come from a fasta file, >>>> so they re under eachother like a i a table in separate file now. >>>> what i want is how how can i check it in the fasta File (so in the >>>> IPI-Human FAsta File), i they re really there; >>>> if yes please copy the entire entry of this Number (>....the >>>> sequence >>>> also)in new fasta file.so that i get at the end a new >>>> FASTA file with jus this IPI Accession Number. >>>> thx and hope was clearly. >>>> >>>> >>>> >>>> >>> Ok, first of all, I'd read the contents of your Accession numbers >>> into a >>> hash, something like the following (this could be written in a >>> shorter >>> form, but since you're a newbie I'll leave it in a longer form so >>> you >>> can follow easier). >>> >>> -- start script -- >>> use strict; >>> use Bio::SeqIO; >>> >>> # change the following three lines to point to the relevant paths >>> # of your list of accessions file, your fasta file and your output >>> # fasta file >>> my $acc_file = "/path/to/your/file"; >>> my $fasta_file_in = "/path/to/your/fasta/file"; >>> my $fasta_file_out = "/path/to/your/fasta/output/file"; >>> >>> # Use a hash to keep a record of accessions we want to find >>> my %hash_of_req_acc; >>> >>> # read all the required accessions from the file into the hash as >>> keys >>> open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >>> while () { >>> my $line = $_; >>> chomp $line; >>> $hash_of_req_acc{$_} = 1; >>> } >>> close ACC_FILE; >>> >>> my $seqio_object_in = Bio::SeqIO->new( >>> -file => $fasta_file_in, >>> -format => 'fasta' >>> ); >>> my $seqio_object_out = Bio::SeqIO->new( >>> -file => $fasta_file_out, >>> -format => 'fasta' >>> ); >>> >>> # loop through all the sequences in the fasta file >>> while (my $seq_object = $seqio_object_in->next_seq) { >>> # get the sequence accession for easy matching >>> my $seq_acc = $seq_object->accession_number; >>> >>> # write the sequence object to the output fasta file if we have a >>> matching accession >>> $seqio_object_out->write_seq($seq_object) if exists >>> $hash_of_req_acc{$seq_acc}; >>> } >>> -- end script -- >>> >>> I haven't tested this, but it should at least get you started. >>> Also, the >>> fasta description line in the output file may not be exactly as >>> it was >>> in the input fasta file - if this really matters, you may need to >>> get >>> back to us. Also, if the input fasta file is huge (many thousands of >>> sequences) it may be wise to create an index of the fasta file in >>> order >>> to speed up retrieval. >>> >>> You may find this page helpful: >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> Anyway, hope this helps to get you started. >>> Nath >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Tue Oct 2 21:05:25 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Oct 2007 14:05:25 +1300 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk><47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk><47022278.7010700@web.de> <47025AD9.1090105@web.de><4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Message-ID: Hi all, I'm using a modified version of Lincoln's tutorial (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) and I'm colouring the HSPs by setting the -bgcolor by score with a sub to give a similar image to that from NCBI but for some reason, my colours are coming out wrong (see attached example) They seem to be off by one but I can't see why. Any ideas? I can't be certain but I think it's only started doing this since our BLAST upgrade to 2.2.17 a few weeks ago. Here's the colouring code: ------------------------------------------------------------------------ ------- my $track = $panel->add_track( -glyph => 'segments', -label => 1, -connector => 'dashed', -bgcolor => sub { my $feature = shift; my $score = $feature->score; return 'red' if $score >= 200; return 'fuchsia' if $score >= 80; return 'lime' if $score >= 50; return 'blue' if $score >= 40; return 'black'; }, -font2color => 'gray', -sort_order => 'high_score', -description => sub { my $feature = shift; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my $score = $feature->score; "$description, score=$score"; }, ); ------------------------------------------------------------------------ --------- Thanx, Russell Smithies ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= -------------- next part -------------- A non-text attachment was scrubbed... Name: example.png Type: image/png Size: 18507 bytes Desc: example.png Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071003/72371841/attachment.png From aaron.j.mackey at gsk.com Tue Oct 2 21:40:14 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 2 Oct 2007 21:40:14 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: <3DC00A97-EF7E-41B4-854F-B088715AB901@bioperl.org> Message-ID: Let me second Jason's comment that while BLAST is a great search program, it is not a very good alignment algorithm. In this day and age with so many good pairwise alignment algorithms out there (customized for the context in which the alignment is performed), BLAST-based alignments should frankly be ignored. See: exonerate, pairagon, etc. Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with the FASTA package) is now vector-parallelized on most i386 architectures, it is only about 10 times slower than BLAST for complete database searches (with superior sensitivity/specificity); add PVM or MPI-based CPU parallelization on top of that, and there's almost no reason to even run BLAST anymore ... -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > I think my answer before was something to the tune of: > > Use an alignment algorithm that finds a single best alignment like > FASTA or Smith-Waterman (SW) if what you want is a single number that > represents the alignment. BLAST is great for fast searching but > FASTA or SW/SSEARCH are going to be better at creating an alignment. > Consider the -postsw option in WUBLAST as well as it will realign the > HSPs with SW. > > I personally never use the frac alignment summary stats for the Hit > object for this reason unless I know I am going to have a single HSP. > > -jason > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > Hi all, > > > > This topic was discussed before, but I would like to put it on the > > list > > again, maybe someone has an update. > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > frac_aligned_hit can also be used in the hit context, after HSP > > tilling. In > > my point of view, it is better to use it just in HSPs individually, > > because > > there are some rare/strange kinds of alignments. However, we > > frequently need > > to get one measure of the whole alignment. > > > > Any of the BioPerl masters has an update on this topic ? What is > > the best > > current usage ? > > > > Best. > > > > Thiago > > > > -- > > "Innovation distinguishes between a leader and a follower." > > Steve Jobs > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cuiw at ncbi.nlm.nih.gov Wed Oct 3 10:50:47 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 3 Oct 2007 10:50:47 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: References: <3DC00A97-EF7E-41B4-854F-B088715AB901@bioperl.org> Message-ID: <18C407FD4FFB424292D769FBD68C198701B18C35@NIHCESMLBX8.nih.gov> I agree that BLAST is not a very good alignment algorithm but believe there are plenty of reasons to run BLAST, especially when placing a contig /BAC/PAC to a genome. In those cases, fully implementation of SW requires an unpractical matrix of n X m. Currently we are developing an algorithm which will run global alignment after BLAST. Hopefully a Perl wrapper will become available next year. Wenwu Cui, PhD -----Original Message----- From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] Sent: Tuesday, October 02, 2007 9:40 PM To: Jason Stajich Cc: bioperl-l list; Thiago Venancio Subject: Re: [Bioperl-l] frac_* methods Let me second Jason's comment that while BLAST is a great search program, it is not a very good alignment algorithm. In this day and age with so many good pairwise alignment algorithms out there (customized for the context in which the alignment is performed), BLAST-based alignments should frankly be ignored. See: exonerate, pairagon, etc. Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with the FASTA package) is now vector-parallelized on most i386 architectures, it is only about 10 times slower than BLAST for complete database searches (with superior sensitivity/specificity); add PVM or MPI-based CPU parallelization on top of that, and there's almost no reason to even run BLAST anymore ... -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > I think my answer before was something to the tune of: > > Use an alignment algorithm that finds a single best alignment like > FASTA or Smith-Waterman (SW) if what you want is a single number that > represents the alignment. BLAST is great for fast searching but > FASTA or SW/SSEARCH are going to be better at creating an alignment. > Consider the -postsw option in WUBLAST as well as it will realign the > HSPs with SW. > > I personally never use the frac alignment summary stats for the Hit > object for this reason unless I know I am going to have a single HSP. > > -jason > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > Hi all, > > > > This topic was discussed before, but I would like to put it on the > > list > > again, maybe someone has an update. > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > frac_aligned_hit can also be used in the hit context, after HSP > > tilling. In > > my point of view, it is better to use it just in HSPs individually, > > because > > there are some rare/strange kinds of alignments. However, we > > frequently need > > to get one measure of the whole alignment. > > > > Any of the BioPerl masters has an update on this topic ? What is > > the best > > current usage ? > > > > Best. > > > > Thiago > > > > -- > > "Innovation distinguishes between a leader and a follower." > > Steve Jobs > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From aaron.j.mackey at gsk.com Wed Oct 3 11:53:12 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Wed, 3 Oct 2007 11:53:12 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: <18C407FD4FFB424292D769FBD68C198701B18C35@NIHCESMLBX8.nih.gov> Message-ID: I think Wenwu makes a nice distinction here between alignment and placement. BLAST is great at finding things and (thus) placing things. String matching has a long and rich history in computer science, and we tend to confuse the terms "alignment" with "matching". The "align a BAC/PAC to a genome" problem is one of string matching (with allowance for errors due to sequencing artifacts and possible SNPs); if there were no errors, we wouldn't use BLAST at all (and, in fact, I personally think programs such as MUMMER, or the various genome assembly tiling algorithms, are better for this particular problem). The problem of pairwise alignment can also be called matching, but the distinction (at least to me) is that the "errors" are true evolutionary mutations, and are expected to occur naturally (i.e. are not an artifact of the experiment that in an optimal world would not occur). BLAST is good at finding matches whose "errors" fit scoring-matrix-based evolutionary models, but it isn't very good at teasing out the actual evolutionary events that lead to those "errors" (this is not really a criticism of BLAST - it's job is not to generate evolutionarily-accurate, and -complete alignments, but to identify evolutionarily-conserved regions having statistical significance) Please don't get me wrong, I think BLAST is an invaluable tool that fully deserves its top-most place in the bioinformatics hall of fame. But I also don't believe that bioinformatics begins and ends with running a BLAST search and poring over the report details. -Aaron "Cui, Wenwu (NIH/NLM/NCBI) [C]" wrote on 10/03/2007 10:50:47 AM: > I agree that BLAST is not a very good alignment algorithm but believe > there are plenty of reasons to run BLAST, especially when placing a > contig /BAC/PAC to a genome. In those cases, fully implementation of SW > requires an unpractical matrix of n X m. > > Currently we are developing an algorithm which will run global alignment > after BLAST. Hopefully a Perl wrapper will become available next year. > > > Wenwu Cui, PhD > > -----Original Message----- > From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] > Sent: Tuesday, October 02, 2007 9:40 PM > To: Jason Stajich > Cc: bioperl-l list; Thiago Venancio > Subject: Re: [Bioperl-l] frac_* methods > > Let me second Jason's comment that while BLAST is a great search > program, > it is not a very good alignment algorithm. In this day and age with so > many good pairwise alignment algorithms out there (customized for the > context in which the alignment is performed), BLAST-based alignments > should frankly be ignored. See: exonerate, pairagon, etc. > > Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with > the > FASTA package) is now vector-parallelized on most i386 architectures, it > > is only about 10 times slower than BLAST for complete database searches > (with superior sensitivity/specificity); add PVM or MPI-based CPU > parallelization on top of that, and there's almost no reason to even run > > BLAST anymore ... > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > > > I think my answer before was something to the tune of: > > > > Use an alignment algorithm that finds a single best alignment like > > FASTA or Smith-Waterman (SW) if what you want is a single number that > > represents the alignment. BLAST is great for fast searching but > > FASTA or SW/SSEARCH are going to be better at creating an alignment. > > Consider the -postsw option in WUBLAST as well as it will realign the > > HSPs with SW. > > > > I personally never use the frac alignment summary stats for the Hit > > object for this reason unless I know I am going to have a single HSP. > > > > -jason > > > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > > > Hi all, > > > > > > This topic was discussed before, but I would like to put it on the > > > list > > > again, maybe someone has an update. > > > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > > frac_aligned_hit can also be used in the hit context, after HSP > > > tilling. In > > > my point of view, it is better to use it just in HSPs individually, > > > because > > > there are some rare/strange kinds of alignments. However, we > > > frequently need > > > to get one measure of the whole alignment. > > > > > > Any of the BioPerl masters has an update on this topic ? What is > > > the best > > > current usage ? > > > > > > Best. > > > > > > Thiago > > > > > > -- > > > "Innovation distinguishes between a leader and a follower." > > > Steve Jobs > > > > > > ======================== > > > Thiago Motta Venancio, MSc > > > PhD student in Bioinformatics > > > University of Sao Paulo > > > ======================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From vebaev at gmail.com Wed Oct 3 12:44:35 2007 From: vebaev at gmail.com (Vesselin Baev) Date: Wed, 3 Oct 2007 19:44:35 +0300 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: References: Message-ID: Hi, What methods should I use to draw a CG plot of a sequence (with bio::graphics)? Thanks -- ------------------------------------------------ University of Plovdiv Faculty of Biology Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 57-444-67 Skype: vesselin_baev vebaev at gmail.com -- ------------------------------------------------ University of Plovdiv Faculty of Biology Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 57-444-67 Skype: vesselin_baev vebaev at gmail.com From cuiw at ncbi.nlm.nih.gov Wed Oct 3 13:37:51 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 3 Oct 2007 13:37:51 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: References: <18C407FD4FFB424292D769FBD68C198701B18C35@NIHCESMLBX8.nih.gov> Message-ID: <18C407FD4FFB424292D769FBD68C198701B18C36@NIHCESMLBX8.nih.gov> I agree what you said. One of the reasons that we introduce 'BLAST-guided-global alignment (NW)' is that a significant amount of clones are either of low quality, partially sequenced, erroneously assembled, or come from non reference strain. Wenwu Cui, PhD -----Original Message----- From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] Sent: Wednesday, October 03, 2007 11:53 AM To: Cui, Wenwu (NIH/NLM/NCBI) [C] Cc: bioperl-l list; Jason Stajich; Thiago Venancio Subject: RE: [Bioperl-l] frac_* methods I think Wenwu makes a nice distinction here between alignment and placement. BLAST is great at finding things and (thus) placing things. String matching has a long and rich history in computer science, and we tend to confuse the terms "alignment" with "matching". The "align a BAC/PAC to a genome" problem is one of string matching (with allowance for errors due to sequencing artifacts and possible SNPs); if there were no errors, we wouldn't use BLAST at all (and, in fact, I personally think programs such as MUMMER, or the various genome assembly tiling algorithms, are better for this particular problem). The problem of pairwise alignment can also be called matching, but the distinction (at least to me) is that the "errors" are true evolutionary mutations, and are expected to occur naturally (i.e. are not an artifact of the experiment that in an optimal world would not occur). BLAST is good at finding matches whose "errors" fit scoring-matrix-based evolutionary models, but it isn't very good at teasing out the actual evolutionary events that lead to those "errors" (this is not really a criticism of BLAST - it's job is not to generate evolutionarily-accurate, and -complete alignments, but to identify evolutionarily-conserved regions having statistical significance) Please don't get me wrong, I think BLAST is an invaluable tool that fully deserves its top-most place in the bioinformatics hall of fame. But I also don't believe that bioinformatics begins and ends with running a BLAST search and poring over the report details. -Aaron "Cui, Wenwu (NIH/NLM/NCBI) [C]" wrote on 10/03/2007 10:50:47 AM: > I agree that BLAST is not a very good alignment algorithm but believe > there are plenty of reasons to run BLAST, especially when placing a > contig /BAC/PAC to a genome. In those cases, fully implementation of SW > requires an unpractical matrix of n X m. > > Currently we are developing an algorithm which will run global alignment > after BLAST. Hopefully a Perl wrapper will become available next year. > > > Wenwu Cui, PhD > > -----Original Message----- > From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] > Sent: Tuesday, October 02, 2007 9:40 PM > To: Jason Stajich > Cc: bioperl-l list; Thiago Venancio > Subject: Re: [Bioperl-l] frac_* methods > > Let me second Jason's comment that while BLAST is a great search > program, > it is not a very good alignment algorithm. In this day and age with so > many good pairwise alignment algorithms out there (customized for the > context in which the alignment is performed), BLAST-based alignments > should frankly be ignored. See: exonerate, pairagon, etc. > > Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with > the > FASTA package) is now vector-parallelized on most i386 architectures, it > > is only about 10 times slower than BLAST for complete database searches > (with superior sensitivity/specificity); add PVM or MPI-based CPU > parallelization on top of that, and there's almost no reason to even run > > BLAST anymore ... > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > > > I think my answer before was something to the tune of: > > > > Use an alignment algorithm that finds a single best alignment like > > FASTA or Smith-Waterman (SW) if what you want is a single number that > > represents the alignment. BLAST is great for fast searching but > > FASTA or SW/SSEARCH are going to be better at creating an alignment. > > Consider the -postsw option in WUBLAST as well as it will realign the > > HSPs with SW. > > > > I personally never use the frac alignment summary stats for the Hit > > object for this reason unless I know I am going to have a single HSP. > > > > -jason > > > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > > > Hi all, > > > > > > This topic was discussed before, but I would like to put it on the > > > list > > > again, maybe someone has an update. > > > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > > frac_aligned_hit can also be used in the hit context, after HSP > > > tilling. In > > > my point of view, it is better to use it just in HSPs individually, > > > because > > > there are some rare/strange kinds of alignments. However, we > > > frequently need > > > to get one measure of the whole alignment. > > > > > > Any of the BioPerl masters has an update on this topic ? What is > > > the best > > > current usage ? > > > > > > Best. > > > > > > Thiago > > > > > > -- > > > "Innovation distinguishes between a leader and a follower." > > > Steve Jobs > > > > > > ======================== > > > Thiago Motta Venancio, MSc > > > PhD student in Bioinformatics > > > University of Sao Paulo > > > ======================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Oct 3 14:19:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 Oct 2007 13:19:43 -0500 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: References: Message-ID: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> You should look at Bio::Graphics::Glyph::dna. From the POD: --------------------------- This glyph draws DNA sequences. At high magnifications, this glyph will draw the actual base pairs of the sequence (both strands). At low magnifications, the glyph will plot the GC content. By default, the GC calculation will use non-overlapping bins, but this can be changed by specifying the gc_window option, in which case, a sliding window calculation will be used. For this glyph to work, the feature must return a DNA sequence string in response to the dna() method. For example, you can use a Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq like this: my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800 ); $feature->attach_seq($dna); $panel->add_track( $feature, -glyph => 'dna' ); A Bio::Graphics::Feature object may also be used. --------------------------- chris On Oct 3, 2007, at 11:44 AM, Vesselin Baev wrote: > Hi, > What methods should I use to draw a CG plot of a sequence (with > bio::graphics)? > > Thanks > > -- > ------------------------------------------------ > University of Plovdiv > Faculty of Biology > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > 032/ 261 (534) > 089/ 57-444-67 > Skype: vesselin_baev > vebaev at gmail.com > > -- > ------------------------------------------------ > University of Plovdiv > Faculty of Biology > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > 032/ 261 (534) > 089/ 57-444-67 > Skype: vesselin_baev > vebaev at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From vebaev at gmail.com Wed Oct 3 14:31:49 2007 From: vebaev at gmail.com (Vesselin Baev) Date: Wed, 3 Oct 2007 21:31:49 +0300 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> References: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> Message-ID: Thanks, I will use Bio::Graphics::Glyph::dna for the classical CG% (this type is for C+G % or CpG)? and if I want to draw a similar plot but for example for a % of dinucleotide (NpN) occurrances in a sliding windiw, what should I use? Thanks! 2007/10/3, Chris Fields : > > You should look at Bio::Graphics::Glyph::dna. From the POD: > > --------------------------- > > This glyph draws DNA sequences. At high magnifications, this glyph > will draw the actual base pairs of the sequence (both strands). At > low magnifications, the glyph will plot the GC content. By default, > the GC calculation will use non-overlapping bins, but this can be > changed by specifying the gc_window option, in which case, a > sliding window calculation will be used. > > For this glyph to work, the feature must return a DNA sequence string > in response to the dna() method. For example, you can use a > Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq > like this: > my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); > my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end > => 800 ); > $feature->attach_seq($dna); > $panel->add_track( $feature, -glyph => 'dna' ); > > A Bio::Graphics::Feature object may also be used. > > --------------------------- > > chris > > On Oct 3, 2007, at 11:44 AM, Vesselin Baev wrote: > > > Hi, > > What methods should I use to draw a CG plot of a sequence (with > > bio::graphics)? > > > > Thanks > > > > -- > > ------------------------------------------------ > > University of Plovdiv > > Faculty of Biology > > Dept. Molecular Biology > > Bioinformatics Group > > Tzar Assen 24 > > Plovdiv 4000, BULGARIA > > 032/ 261 (534) > > 089/ 57-444-67 > > Skype: vesselin_baev > > vebaev at gmail.com > > > > -- > > ------------------------------------------------ > > University of Plovdiv > > Faculty of Biology > > Dept. Molecular Biology > > Bioinformatics Group > > Tzar Assen 24 > > Plovdiv 4000, BULGARIA > > 032/ 261 (534) > > 089/ 57-444-67 > > Skype: vesselin_baev > > vebaev at gmail.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- ------------------------------------------------ University of Plovdiv Faculty of Biology Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 57-444-67 Skype: vesselin_baev vebaev at gmail.com From dave at davemessina.com Wed Oct 3 14:22:23 2007 From: dave at davemessina.com (Dave Messina) Date: Wed, 3 Oct 2007 20:22:23 +0200 Subject: [Bioperl-l] CG content plot of sequence Message-ID: <37574C6D-98BA-47A5-875E-9255377133B8@sbc.su.se> Hi Vesselin, I believe what you want to use is Bio::Graphics::Panel with the Bio::Graphics::Glyph::dna glyph. See http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/ Graphics/Glyph/dna.html and http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/ Graphics/Panel.html I think the example code will help you to do what you want. Dave From cjfields at uiuc.edu Wed Oct 3 15:10:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 Oct 2007 14:10:46 -0500 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: References: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> Message-ID: <7B62B534-2A8C-4EAE-B956-F0FADB726195@uiuc.edu> On Oct 3, 2007, at 1:31 PM, Vesselin Baev wrote: > Thanks, > I will use Bio::Graphics::Glyph::dna for the classical CG% > (this type is for C+G % or CpG)? > > > and if I want to draw a similar plot but for example for a % of > dinucleotide > (NpN) occurrances in a sliding windiw, what should I use? > > > Thanks! It would be GC content, not CpG. Not sure what you would use for dinucleotide content; you could look at the Bio::Graphics::Glyph::dna code and either subclass it for your needs (probably the best option) or add an extra parameter and 'rewire' the appropriate methods to do what you want. chris From dmessina at sbc.su.se Wed Oct 3 14:55:10 2007 From: dmessina at sbc.su.se (dmessina at sbc.su.se) Date: Wed, 3 Oct 2007 20:55:10 +0200 (CEST) Subject: [Bioperl-l] CG content plot of sequence Message-ID: <61118.217.213.158.117.1191437710.squirrel@mail.sbc.su.se> Hi Vesselin, I believe what you want to use is Bio::Graphics::Panel with the Bio::Graphics::Glyph::dna glyph. See http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Glyph/dna.html and http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html I think the example code will help you to do what you want. Dave From lzhtom at hotmail.com Wed Oct 3 18:30:43 2007 From: lzhtom at hotmail.com (zhihuali) Date: Wed, 3 Oct 2007 22:30:43 +0000 Subject: [Bioperl-l] Loading Blast Report in a minimal way Message-ID: Hi netters, I'm using SearchIO to parse my blast reports. They are extremely huge, and not surprisingly, it's extremely slow and sometimes the system crashed due to memmory problem. As I can handle small reports quickly, it seems like a problem related to the way SearchIO works: it slurps the whole report into the memory and builds millions of objects. I've checked old posts and some people used FastHitEventBuilder to build hit objects without any hsp objects. And some people suggested using tabular output of blast. But in my case I need to go to each of the hsps of each hit, parse the alignment, and gather the information needed if that hsp fits certain criteria, and then move on to the next hsp/or jump over to the next hit/ or exit the processing, according to the information I have already got. An ideal way would be to read one hsp at a time from the report to the memory. Is there some way to modify SearchIO (or build another Search Event) to do this? Thanks a lot! Zhihua Li _________________________________________________________________ ?? Live Search ?????????????? http://www.live.com/?searchOnly=true From budd at embl-heidelberg.de Thu Oct 4 09:43:57 2007 From: budd at embl-heidelberg.de (Aidan Budd) Date: Thu, 4 Oct 2007 15:43:57 +0200 (CEST) Subject: [Bioperl-l] Adding info to Features to view in SwissProt Message-ID: Hi bioperlers, I've been trying to add info to a feature in a RichSeq object so that when the Seq is written in swissprot format I can put information in the final field of the feature FT DOMAIN 208 392 Helicase ATP-binding. i.e. where it says "Helicase ATP-binding." I can control what goes in the primary field, and set the location, but haven't been able to work out how to add info to go in this final field. Thanks, Aidan -- ---------------------------------------------------------------------- Aidan Budd, PhD tel:+49 (0)6221 387 8530 EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 Meyerhofstr. 1, 69117 Heidelberg, Germany URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html From cjfields at uiuc.edu Thu Oct 4 10:32:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 09:32:43 -0500 Subject: [Bioperl-l] Adding info to Features to view in SwissProt In-Reply-To: References: Message-ID: <84A6CB47-6890-476B-A375-6FCF37655330@uiuc.edu> Try adding it as a tag value with the name 'description','note', or 'product' (the first two are probably the best to use for most purposes). There is a quick explanation here: http://www.bioperl.org/wiki/HOWTO:Feature- Annotation#Building_Your_Own_Sequences You can also do something like: $sf->add_tag_value('description', 'Helicase ATP-binding'); See Bio::SeqFeature::Generic POD for more. chris On Oct 4, 2007, at 8:43 AM, Aidan Budd wrote: > Hi bioperlers, > > I've been trying to add info to a feature in a RichSeq object so > that when > the Seq is written in swissprot format I can put information in the > final > field of the feature > > FT DOMAIN 208 392 Helicase ATP-binding. > > i.e. where it says "Helicase ATP-binding." > > I can control what goes in the primary field, and set the location, > but > haven't been able to work out how to add info to go in this final > field. > > Thanks, > > Aidan > > -- > ---------------------------------------------------------------------- > Aidan Budd, PhD tel:+49 (0)6221 387 8530 > EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 > Meyerhofstr. 1, 69117 Heidelberg, Germany > > URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cain.cshl at gmail.com Thu Oct 4 11:08:52 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 04 Oct 2007 11:08:52 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] Fwd: DB::SeqFeature::Store error In-Reply-To: References: Message-ID: <1191510532.2787.16.camel@localhost.localdomain> Hi Chris, I think adding the type=MYISAM is the right thing to do; please go ahead and commit it. Scott On Mon, 2007-10-01 at 10:14 -0500, Chris Fields wrote: > Just thought I would forward this on to the GBrowse list as well in > case anyone has run into the same problem. The issue pops up when > using bioperl from CVS and appears to be related to a fix Lincoln > added recently in Bio::DB::SeqFeature::Store::DBI::mysql using > FULLTEXT, which only works for MyISAM currently. > > Making the suggested changes (adding TYPE=MYISAM) to the CREATE TABLE > queries does work when InnoDB is set to the default. Should I go > ahead and commit? > > chris > > Begin forwarded message: > > > I'm getting the following error on my local MySQL (v 5.0.41) with > > bp_seqfeature_load: > > > > -------------------- EXCEPTION -------------------- > > MSG: The used table type doesn't support FULLTEXT indexes > > STACK Bio::DB::SeqFeature::Store::DBI::mysql::_init_database /Library/ > > Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:414 > > STACK Bio::DB::SeqFeature::Store::init_database /Library/Perl/5.8.6/ > > Bio/DB/SeqFeature/Store.pm:382 > > STACK Bio::DB::SeqFeature::Store::DBI::mysql::init /Library/Perl/ > > 5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:218 > > STACK Bio::DB::SeqFeature::Store::new /Library/Perl/5.8.6/Bio/DB/ > > SeqFeature/Store.pm:345 > > STACK toplevel /usr/local/bin/bp_seqfeature_load.pl:57 > > ------------------------------------------- > > > > The default setting for storage is InnoDB; switching to MyISAM fixes > > the issue. Should we specify TYPE = MyISAM with the various CREATE > > TABLE queries in Bio::DB::SeqFeature::Store::DBI::mysql to be on the > > safe side? > > > > chris > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Thu Oct 4 11:14:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 10:14:37 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Fwd: DB::SeqFeature::Store error In-Reply-To: <1191510532.2787.16.camel@localhost.localdomain> References: <1191510532.2787.16.camel@localhost.localdomain> Message-ID: <9DEFB5C1-6F79-465B-8FB4-68305C53E427@uiuc.edu> Done. If we run into issues and need to roll back let me know. chris On Oct 4, 2007, at 10:08 AM, Scott Cain wrote: > Hi Chris, > > I think adding the type=MYISAM is the right thing to do; please go > ahead > and commit it. > > Scott > > > > On Mon, 2007-10-01 at 10:14 -0500, Chris Fields wrote: >> Just thought I would forward this on to the GBrowse list as well in >> case anyone has run into the same problem. The issue pops up when >> using bioperl from CVS and appears to be related to a fix Lincoln >> added recently in Bio::DB::SeqFeature::Store::DBI::mysql using >> FULLTEXT, which only works for MyISAM currently. >> >> Making the suggested changes (adding TYPE=MYISAM) to the CREATE TABLE >> queries does work when InnoDB is set to the default. Should I go >> ahead and commit? >> >> chris >> >> Begin forwarded message: >> >>> I'm getting the following error on my local MySQL (v 5.0.41) with >>> bp_seqfeature_load: >>> >>> -------------------- EXCEPTION -------------------- >>> MSG: The used table type doesn't support FULLTEXT indexes >>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::_init_database / >>> Library/ >>> Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:414 >>> STACK Bio::DB::SeqFeature::Store::init_database /Library/Perl/5.8.6/ >>> Bio/DB/SeqFeature/Store.pm:382 >>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::init /Library/Perl/ >>> 5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:218 >>> STACK Bio::DB::SeqFeature::Store::new /Library/Perl/5.8.6/Bio/DB/ >>> SeqFeature/Store.pm:345 >>> STACK toplevel /usr/local/bin/bp_seqfeature_load.pl:57 >>> ------------------------------------------- >>> >>> The default setting for storage is InnoDB; switching to MyISAM fixes >>> the issue. Should we specify TYPE = MyISAM with the various CREATE >>> TABLE queries in Bio::DB::SeqFeature::Store::DBI::mysql to be on the >>> safe side? >>> >>> chris >> >> >> >> >> --------------------------------------------------------------------- >> ---- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a > browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Oct 4 15:30:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 14:30:48 -0500 Subject: [Bioperl-l] blastxml oddity Message-ID: <23DE7F77-3FB4-4E00-86CA-43B55A6A7311@uiuc.edu> Just noticed an oddity from BLASTXML output from the NCBI server; I'm cc'ing this to NCBI so maybe they can explain. BTW, the following doesn't occur via URLAPI. When running a standard BLAST query using the NCBI web page, if requesting XML output after the run I get the entire query seq masked out and no midline. This occurs with all default settings except output type (set to XML). Can anyone replicate this? Here's a sample: 1 320.472 820 2.17708e-86 1 181 1 181 0 0 0 0 0 181 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX MNQKAVILDEQAIRRALTRIAHEMIERNKGMNNCILVGIKTRGIYLAKRLAERIEQIEGNPV TVGEIDITLYRDDLSKKTSNDEPLVKGADIPVDITDQKVILVDDVLYTGRTVRAGMDALVDVGRPSSIQLAV LVDRGHRELPIRADYIGKNIPTSKSEKVMVQLDEVDQNDLVAIYENE chris From cjfields at uiuc.edu Thu Oct 4 16:36:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 15:36:25 -0500 Subject: [Bioperl-l] blastxml oddity In-Reply-To: <23DE7F77-3FB4-4E00-86CA-43B55A6A7311@uiuc.edu> References: <23DE7F77-3FB4-4E00-86CA-43B55A6A7311@uiuc.edu> Message-ID: <86D7D694-1ACF-4D91-9FEC-0FEA21C1B689@uiuc.edu> NCBI's BLAST team is working on this; it occurs only if you use an GI/ accession instead of a full sequence, and only for XML. chris On Oct 4, 2007, at 2:30 PM, Chris Fields wrote: > Just noticed an oddity from BLASTXML output from the NCBI server; I'm > cc'ing this to NCBI so maybe they can explain. BTW, the following > doesn't occur via URLAPI. > > When running a standard BLAST query using the NCBI web page, if > requesting XML output after the run I get the entire query seq masked > out and no midline. This occurs with all default settings except > output type (set to XML). Can anyone replicate this? > > Here's a sample: > > > > 1 > 320.472 > 820 > 2.17708e-86 > 1 > 181 > 1 > 181 > 0 > 0 > 0 > 0 > 0 > 181 > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > MNQKAVILDEQAIRRALTRIAHEMIERNKGMNNCILVGIKTRGIYLAKRLAERIEQIEGN > PV > TVGEIDITLYRDDLSKKTSNDEPLVKGADIPVDITDQKVILVDDVLYTGRTVRAGMDALVDVGRPSSIQL > AV > LVDRGHRELPIRADYIGKNIPTSKSEKVMVQLDEVDQNDLVAIYENE > > > > > > > > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From e-just at northwestern.edu Fri Oct 5 15:35:25 2007 From: e-just at northwestern.edu (Eric Just) Date: Fri, 5 Oct 2007 14:35:25 -0500 Subject: [Bioperl-l] bp_search2gff.pl Message-ID: Hello, I have been playing with the bp_search2gff.pl script (on HEAD of bioperl-live). There are a couple of issues I was wondering about. One is the ID that gets generated for a match feature when the --match option is set. The ID is set to the ID of the query sequence. This can be problematic if you are representing the query sequence and the blast hit in the same gff file. When using the resultant gff file for loading into Chado, it also creates a problem if you have more than one hit for a given query sequence, for example if you ran two different analyses that each had a hit for a given query. Would it be possible to have an option to create a unique ID for match features. One suggestion could be to create an ID based on the ID of the query + the id of the hit + the source As long as two different analyses were loaded as different sources, this would ensure unique IDs for the match features. Also, is there a reason for writing the Target string as Target=Sequence:SOME_ID as opposed to Target=SOME_ID The latter seems a little more in line with the gff3 spec and plays a little nicer with the GMOD tools. Thanks for looking into this. Eric From cjfields at uiuc.edu Fri Oct 5 15:51:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 Oct 2007 14:51:20 -0500 Subject: [Bioperl-l] bp_search2gff.pl In-Reply-To: References: Message-ID: <54632965-D21E-4B0B-B2D6-254D9CB88F3B@uiuc.edu> We might want to file this as a bug so we can track it. The core devs have been mulling over the state of GFF/GFF3 in BioPerl; proper handling of any SearchIO data is certainly included in that. I believe some road forward is to be planned soon (after Genome Informatics). chris On Oct 5, 2007, at 2:35 PM, Eric Just wrote: > Hello, > > I have been playing with the bp_search2gff.pl script (on HEAD of > bioperl-live). There are a couple of issues I was wondering about. > > One is the ID that gets generated for a match feature when the --match > option is set. The ID is set to the ID of the query sequence. This > can be problematic if you are representing the query sequence and the > blast hit in the same gff file. When using the resultant gff file for > loading into Chado, it also creates a problem if you have more than > one hit for a given query sequence, for example if you ran two > different analyses that each had a hit for a given query. Would it be > possible to have an option to create a unique ID for match features. > One suggestion could be to create an ID based on the ID of the query + > the id of the hit + the source > > As long as two different analyses were loaded as different sources, > this would ensure unique IDs for the match features. > > > Also, is there a reason for writing the Target string as > > Target=Sequence:SOME_ID > > as opposed to > > Target=SOME_ID > > > The latter seems a little more in line with the gff3 spec and plays a > little nicer with the GMOD tools. > > Thanks for looking into this. > > Eric > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From e-just at northwestern.edu Fri Oct 5 16:06:48 2007 From: e-just at northwestern.edu (Eric Just) Date: Fri, 5 Oct 2007 15:06:48 -0500 Subject: [Bioperl-l] bp_search2gff.pl In-Reply-To: <54632965-D21E-4B0B-B2D6-254D9CB88F3B@uiuc.edu> References: <54632965-D21E-4B0B-B2D6-254D9CB88F3B@uiuc.edu> Message-ID: Thanks Chris, I'll enter it as a bug. I'd also be glad to help on the solution if there is an executive decision made at some point. On 10/5/07, Chris Fields wrote: > We might want to file this as a bug so we can track it. > > The core devs have been mulling over the state of GFF/GFF3 in > BioPerl; proper handling of any SearchIO data is certainly included > in that. I believe some road forward is to be planned soon (after > Genome Informatics). > > chris > > On Oct 5, 2007, at 2:35 PM, Eric Just wrote: > > > Hello, > > > > I have been playing with the bp_search2gff.pl script (on HEAD of > > bioperl-live). There are a couple of issues I was wondering about. > > > > One is the ID that gets generated for a match feature when the --match > > option is set. The ID is set to the ID of the query sequence. This > > can be problematic if you are representing the query sequence and the > > blast hit in the same gff file. When using the resultant gff file for > > loading into Chado, it also creates a problem if you have more than > > one hit for a given query sequence, for example if you ran two > > different analyses that each had a hit for a given query. Would it be > > possible to have an option to create a unique ID for match features. > > One suggestion could be to create an ID based on the ID of the query + > > the id of the hit + the source > > > > As long as two different analyses were loaded as different sources, > > this would ensure unique IDs for the match features. > > > > > > Also, is there a reason for writing the Target string as > > > > Target=Sequence:SOME_ID > > > > as opposed to > > > > Target=SOME_ID > > > > > > The latter seems a little more in line with the gff3 spec and plays a > > little nicer with the GMOD tools. > > > > Thanks for looking into this. > > > > Eric > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From bix at sendu.me.uk Sun Oct 7 08:40:44 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 07 Oct 2007 13:40:44 +0100 Subject: [Bioperl-l] Bio::FeatureIO::gff bug? In-Reply-To: <5FC8F92C-42DD-4DAF-8008-0F8C545065B5@gmx.net> References: <46F784EB.9050507@sendu.me.uk> <7C7D7FB3-86B9-43CF-B506-E66FA8264CFC@uiuc.edu> <46F7B9A1.9080206@sendu.me.uk> <234C109D-85FE-4161-8CBA-8E24BE34C5B5@gmx.net> <46F8DC34.6020908@sendu.me.uk> <8283EF43-2AF0-4B3B-8A00-4DE615186EC7@gmx.net> <5298B700-EFDE-45E4-A8F3-674FA673A0C7@uiuc.edu> <41A20518-63BC-4D01-8FFC-01C903ADD423@gmx.net> <530A0322-A3BC-471D-AE91-17AD8F0EB237@uiuc.edu> <5FC8F92C-42DD-4DAF-8008-0F8C545065B5@gmx.net> Message-ID: <4708D3CC.60909@sendu.me.uk> Hilmar Lapp wrote: > On Sep 28, 2007, at 5:34 PM, Chris Fields wrote: > >> The section writing the gff header info in _initialize() checks the >> file specifically for '>' prior to output; I think Sendu planned on >> changing that to use mode() instead. > > What if we pass in a file handle? mode() is supposed to work in that instance as well: it sees if the file handle is writable. From er at xs4all.nl Sun Oct 7 10:13:45 2007 From: er at xs4all.nl (Erik) Date: Sun, 7 Oct 2007 16:13:45 +0200 (CEST) Subject: [Bioperl-l] bioperl.org In-Reply-To: <4708D3CC.60909@sendu.me.uk> References: <46F784EB.9050507@sendu.me.uk> <7C7D7FB3-86B9-43CF-B506-E66FA8264CFC@uiuc.edu> <46F7B9A1.9080206@sendu.me.uk> <234C109D-85FE-4161-8CBA-8E24BE34C5B5@gmx.net> <46F8DC34.6020908@sendu.me.uk> <8283EF43-2AF0-4B3B-8A00-4DE615186EC7@gmx.net> <5298B700-EFDE-45E4-A8F3-674FA673A0C7@uiuc.edu> <41A20518-63BC-4D01-8FFC-01C903ADD423@gmx.net> <530A0322-A3BC-471D-AE91-17AD8F0EB237@uiuc.edu> <5FC8F92C-42DD-4DAF-8008-0F8C545065B5@gmx.net> <4708D3CC.60909@sendu.me.uk> Message-ID: <12957.156.83.1.146.1191766425.squirrel@webmail.xs4all.nl> http://bioperl.org/ http://www.bioperl.org/wiki/Main_Page seems to be having problems - I only receive an empty screen... regards, Erik From cjfields at uiuc.edu Sun Oct 7 17:16:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 7 Oct 2007 16:16:02 -0500 Subject: [Bioperl-l] bioperl.org In-Reply-To: <12957.156.83.1.146.1191766425.squirrel@webmail.xs4all.nl> References: <46F784EB.9050507@sendu.me.uk> <7C7D7FB3-86B9-43CF-B506-E66FA8264CFC@uiuc.edu> <46F7B9A1.9080206@sendu.me.uk> <234C109D-85FE-4161-8CBA-8E24BE34C5B5@gmx.net> <46F8DC34.6020908@sendu.me.uk> <8283EF43-2AF0-4B3B-8A00-4DE615186EC7@gmx.net> <5298B700-EFDE-45E4-A8F3-674FA673A0C7@uiuc.edu> <41A20518-63BC-4D01-8FFC-01C903ADD423@gmx.net> <530A0322-A3BC-471D-AE91-17AD8F0EB237@uiuc.edu> <5FC8F92C-42DD-4DAF-8008-0F8C545065B5@gmx.net> <4708D3CC.60909@sendu.me.uk> <12957.156.83.1.146.1191766425.squirrel@webmail.xs4all.nl> Message-ID: Should be back up. Jason was running some wiki updates. chris On Oct 7, 2007, at 9:13 AM, Erik wrote: > > http://bioperl.org/ > http://www.bioperl.org/wiki/Main_Page > > seems to be having problems - I only receive an empty > screen... > > regards, > Erik From alan at tll.org.sg Mon Oct 8 01:02:45 2007 From: alan at tll.org.sg (alan) Date: Mon, 8 Oct 2007 13:02:45 +0800 Subject: [Bioperl-l] exonerate In-Reply-To: <73B4E193-69D0-409D-9F89-20FB677F45C9@uiuc.edu> References: <034FB11C-B4E9-4E4E-B213-D4AC6A397B1B@tll.org.sg> <29C4D729-6715-4C19-9872-3B1AF90EAFA3@tll.org.sg> <73B4E193-69D0-409D-9F89-20FB677F45C9@uiuc.edu> Message-ID: Hi Chris, On 02 Oct 2007, at 11:03 PM, Chris Fields wrote: > One option is to try running $run->cleanup() after you finish > parsing, which gets rid of the tempfiles on each run. > I tried this option in the past but it did not change anything. I found a temporary solution: I am using tcsh and the ulimit for open files (descriptors) was set to 256. I changed this limit to 2000 and I got my code to run to completion. This imply that I am circumventing the real problem of open files. I will continue to look at the file closing step in bioperl but the ulimit or limit option allows me to get a quick work around for now. for bash shell (ulimit -n [number]) for tcsh shell (limit descriptors [number]) alan > chris > > On Sep 30, 2007, at 8:53 PM, alan wrote: > >> Hi, >> >> >>>> I am calling exonerate.pm within my script while attempting to >>>> align cDNA to multiple genomic fragments. After processing about >>>> 120+ genomic fragments my code crashes with the following error: >>>> >>>> ** ERROR **: Could not open [/tmp/tlInatbOED] : Too many open files >>>> aborting... >>>> MSG: Exonerate call (/usr/local/bin/exonerate /tmp/8X9jQuHUGF / >>>> tmp/tlInatbOED > /tmp/EolF5qCNLZ/cIf0HfIRf5) crashed: 34304 >>>> STACK Bio::Tools::Run::Alignment::Exonerate::_run /nfs1/alan/ >>>> cvs_src/bioperl-run/Bio/Tools/Run/Alignment/Exonerate.pm:214 >>>> STACK Bio::Tools::Run::Alignment::Exonerate::run /nfs1/alan/ >>>> cvs_src/bioperl-run/Bio/Tools/Run/Alignment/Exonerate.pm:174 >>>> >>>> The code in Exonerate.pm closes the tmpfile at the end of the >>>> routine yet I get the error message about "too many open files". >>>> Any suggestions on how I should be closing these files? >>>> >>>> >>>> Extract from my code that runs exonerate is listed below. >>>> >>>> foreach my $f(@files) { >>>> next unless (-f "$dir/$f"); >>>> my $q_in = Bio::SeqIO->new(-file=>$query, -format=>"Fasta"); >>>> my $query_obj = $q_in->next_seq(); >>>> my $target_in = Bio::SeqIO->new(-file=>"$dir/$f", - >>>> format=>"Fasta"); >>>> my $target_obj = $target_in->next_seq(); >>>> my $run = Bio::Tools::Run::Alignment::Exonerate->new(); >>>> my $exonerate_io = $run->run($query_obj, $target_obj); >>>> >>>> [code for parsing the data.......] >>>> >>>> $exonerate_io->close; #tried this line out of desperation but it >>>> did not help :-) >>>> } >>>> >>>> thanks >>>> alan > > From bix at sendu.me.uk Mon Oct 8 05:54:15 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 08 Oct 2007 10:54:15 +0100 Subject: [Bioperl-l] Loading Blast Report in a minimal way In-Reply-To: References: Message-ID: <4709FE47.3040103@sendu.me.uk> zhihuali wrote: > Hi netters, > > I'm using SearchIO to parse my blast reports. They are extremely > huge, and not surprisingly, it's extremely slow and sometimes the > system crashed due to memmory problem. As I can handle small reports > quickly, it seems like a problem related to the way SearchIO works: > it slurps the whole report into the memory and builds millions of > objects. > > I've checked old posts and some people used FastHitEventBuilder to > build hit objects without any hsp objects. And some people suggested > using tabular output of blast. But in my case I need to go to each of > the hsps of each hit, parse the alignment, and gather the > information needed if that hsp fits certain criteria, and then move > on to the next hsp/or jump over to the next hit/ or exit the > processing, according to the information I have already got. An ideal > way would be to read one hsp at a time from the report to the memory. > Is there some way to modify SearchIO (or build another Search Event) > to do this? Use Bio::SearchIO::blast_pull (ie. use Bio::SearchIO; my $in = Bio::SearchIO->new(-format => 'blast_pull', -file => 't/data/new_blastn.txt'); ) It doesn't yet support all kinds of Blast report, however. Let me know how you get on. Cheers, Sendu. From bix at sendu.me.uk Wed Oct 10 10:09:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Oct 2007 15:09:24 +0100 Subject: [Bioperl-l] Searching by source in Bio::DB::SeqFeature::Store? Message-ID: <470CDD14.4000005@sendu.me.uk> Once I store a feature like so: $db = Bio::DB::SeqFeature::Store->new( # mysql ); $db->new_feature(-primary_tag => 'X', -seq_id => 'Y', #... -source => 'Z'); How do I search for it again? I could have sworn that in the past the source somehow became one of the feature's attributes so that I could do: @feats = $db->features(-attributes => {source => 'Z'}); (Or does/did some other feature-related module store source as an attribute, and in the past I stored those features in my db?) The obvious isn't implemented: @feats = $db->features(-source => 'Z'); Looking at the code I see I can do type and source at the same time: @feats = $db->features(-type => 'X:Z'); which works, but how do I just search for all Z, regardless of type? From lstein at cshl.edu Wed Oct 10 10:20:33 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 10 Oct 2007 10:20:33 -0400 Subject: [Bioperl-l] Searching by source in Bio::DB::SeqFeature::Store? In-Reply-To: <470CDD14.4000005@sendu.me.uk> References: <470CDD14.4000005@sendu.me.uk> Message-ID: <6dce9a0b0710100720p4fc5d0c1w7b1c4bfc786dd5e3@mail.gmail.com> Hi Sendu, This may be a glaring omission on my part. Try searching for: @feats = $db->get_features_by_type(":$source"); Remove the colon in front of $source. This might not work, but give it a try. Lincoln On 10/10/07, Sendu Bala wrote: > > Once I store a feature like so: > > $db = Bio::DB::SeqFeature::Store->new( # mysql ); > $db->new_feature(-primary_tag => 'X', > -seq_id => 'Y', > #... > -source => 'Z'); > > How do I search for it again? I could have sworn that in the past the > source somehow became one of the feature's attributes so that I could do: > > @feats = $db->features(-attributes => {source => 'Z'}); > > (Or does/did some other feature-related module store source as an > attribute, and in the past I stored those features in my db?) > > The obvious isn't implemented: > > @feats = $db->features(-source => 'Z'); > > Looking at the code I see I can do type and source at the same time: > > @feats = $db->features(-type => 'X:Z'); > > which works, but how do I just search for all Z, regardless of type? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Wed Oct 10 11:10:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Oct 2007 16:10:45 +0100 Subject: [Bioperl-l] Searching by source in Bio::DB::SeqFeature::Store? In-Reply-To: <6dce9a0b0710100720p4fc5d0c1w7b1c4bfc786dd5e3@mail.gmail.com> References: <470CDD14.4000005@sendu.me.uk> <6dce9a0b0710100720p4fc5d0c1w7b1c4bfc786dd5e3@mail.gmail.com> Message-ID: <470CEB75.7050804@sendu.me.uk> Lincoln Stein wrote: > Hi Sendu, > > This may be a glaring omission on my part. Try searching for: > > @feats = $db->get_features_by_type(":$source"); > > Remove the colon in front of $source. It doesn't, unfortunately, colon or not. get_features_by_type is implemented using _features(-type => ), just like features(). In any case, I'm searching for multiple things at the same time so need to use the features() method. The following patch seems to do the job in my hands, allowing: @feats = $db->features(-source => 'Z'); to work as expected. Also, we now have: @feats = $db->features(-type => 'X', -source => 'Z'); as a nicer (ie. matching the syntax the user used to create the feature in the first place) alternative to: @feats = $db->features(-type => 'X:Y'); Of course, this patch is specific to the mysql implementation. You may want to check it over to see if it is sane, see if there is a cleaner way to do it, or see if there's a more general way to apply it to all implementations. RCS file: /home/repository/bioperl/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm,v retrieving revision 1.33 diff -r1.33 mysql.pm 724c724,725 < $iterator --- > $iterator, > $sources 731a733 > ['SOURCE','SOURCES'] 760c762,785 < --- > > if (defined($sources)) { > my @sources = ref($sources) eq 'ARRAY' ? @{$sources} : ($sources); > if (defined($types)) { > my @types = ref($types) eq 'ARRAY' ? @{$types} : ($types); > my @final_types; > foreach my $type (@types) { > # *** not sure what to do if user supplies both -source > # and -type where the type includes a source! > if ($type =~ /:/) { > push(@final_types, $type); > } > else { > foreach my $source (@sources) { > push(@final_types, $type.':'.$source); > } > } > } > $types = \@final_types; > } > else { > $types = [map { ':'.$_ } @sources]; > } > } 939,940c964,971 < push @matches,"tl.tag=?"; < push @args,"$primary_tag:$source_tag"; --- > if (length($primary_tag)) { > push @matches,"tl.tag=?"; > push @args,"$primary_tag:$source_tag"; > } > else { > push @matches,"tl.tag LIKE ?"; > push @args,"%:$source_tag"; > } From bix at sendu.me.uk Thu Oct 11 06:40:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 11 Oct 2007 11:40:24 +0100 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change Message-ID: <470DFD98.4030603@sendu.me.uk> Following Chris's changes to SF::Annotated et al., lots of existing user code breaks. Also, at least some Bioperl code breaks, notably Bio::DB::SeqFeature::Store, which in mysql mode calls _get_location_and_bin() which calls $feature->seq_id which ends up storing something like 'Bio::Annotation::SimpleValue=HASH(0x1f435d0)' in the database, instead of an actual sequence id (which completely breaks searching by seq_id). I propose its API be changed to be more consistent with Bio::SeqFeatureI, eg. instead of: seq_id() Usage : $obj->seq_id($newval) Function: holds a string corresponding to the unique seq_id of the sequence underlying the feature (e.g. database accession or primary key). Returns : a Bio::Annotation::SimpleValue object representing the seq_id. Args : on set, some string or a Bio::Annotation::SimpleValue object. we have: seq_id() Usage : $obj->seq_id($newval) Function: holds a string corresponding to the unique seq_id of the sequence underlying the feature (e.g. database accession or primary key). Returns : string representing the seq_id. Args : on set, some string or a Bio::Annotation::SimpleValue object. This would apply to seq_id(), name(), type(), source(), phase() and frame(). Internally the implementation could store the string value in a SimpleValue object. However, I'm obviously missing something, because I have no idea what the justification for returning SimpleValue objects was in the first place (what other module needs them?), nor even what the point of SimpleValue objects is in the first place. From ULNJUJERYDIX at spammotel.com Thu Oct 11 07:32:33 2007 From: ULNJUJERYDIX at spammotel.com (Kevin Lam) Date: Thu, 11 Oct 2007 19:32:33 +0800 Subject: [Bioperl-l] **Fwd: Re: divide and blast blastunsplit blast subsequence In-Reply-To: References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: <5b6410e0710110432mb4a652emc7bb6be1fb378770@mail.gmail.com> wow thanks for the replies! I have gotten the answer from the blast help Subsequence range can be specified by the ?L parameter. Please refer to this web document for more information: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node60.html Other end-user oriented standalone blast documents are at: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ but will try this method as well to see which works better. cheers! kevin There is a script that comes with the bioperl core distribution, > bp_split_seq.pl, which does this. Here's the CVS location: > > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > scripts/seq/?cvsroot=bioperl > > chris > From cjfields at uiuc.edu Thu Oct 11 09:49:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 Oct 2007 08:49:59 -0500 Subject: [Bioperl-l] **Fwd: Re: divide and blast blastunsplit blast subsequence In-Reply-To: <5b6410e0710110432mb4a652emc7bb6be1fb378770@mail.gmail.com> References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> <5b6410e0710110432mb4a652emc7bb6be1fb378770@mail.gmail.com> Message-ID: Yep, started using this myself for mapping old predicted CDS onto new assemblies using BLASTX. Works very well, particularly since it retains the seq position on hits, where using bp_slit_seq would require mapping to the correct coordinates. chris On Oct 11, 2007, at 6:32 AM, Kevin Lam wrote: > wow thanks for the replies! > I have gotten the answer from the blast help > > Subsequence range can be specified by the ?L parameter. Please refer > to this web document for more information: > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/ > blastall_node60.html > > > > Other end-user oriented standalone blast documents are at: > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ > > > but will try this method as well to see which works better. > > cheers! > kevin > > There is a script that comes with the bioperl core distribution, > >> bp_split_seq.pl, which does this. Here's the CVS location: >> >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >> scripts/seq/?cvsroot=bioperl >> >> chris >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Oct 11 10:19:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 Oct 2007 09:19:51 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: <470DFD98.4030603@sendu.me.uk> References: <470DFD98.4030603@sendu.me.uk> Message-ID: On Oct 11, 2007, at 5:40 AM, Sendu Bala wrote: > Following Chris's changes to SF::Annotated et al., lots of existing > user > code breaks. Also, at least some Bioperl code breaks, notably > Bio::DB::SeqFeature::Store, which in mysql mode calls > _get_location_and_bin() which calls $feature->seq_id which ends up > storing something like 'Bio::Annotation::SimpleValue=HASH > (0x1f435d0)' in > the database, instead of an actual sequence id (which completely > breaks > searching by seq_id). > > I propose its API be changed to be more consistent with > Bio::SeqFeatureI, eg. instead of: > > seq_id() > Usage : $obj->seq_id($newval) > Function: holds a string corresponding to the unique > seq_id of the sequence underlying the feature > (e.g. database accession or primary key). > Returns : a Bio::Annotation::SimpleValue object representing the > seq_id. > Args : on set, some string or a Bio::Annotation::SimpleValue > object. > > we have: > > seq_id() > Usage : $obj->seq_id($newval) > Function: holds a string corresponding to the unique > seq_id of the sequence underlying the feature > (e.g. database accession or primary key). > Returns : string representing the seq_id. > Args : on set, some string or a Bio::Annotation::SimpleValue > object. > > This would apply to seq_id(), name(), type(), source(), phase() and > frame(). Internally the implementation could store the string value > in a > SimpleValue object. Agreed. It would be easy to change over but we need to also make sure FeatureIO fixes are in place. In reality all FeatureIO methods should be changed over to recognize any SeqFeatureI or (if we retain it) the stricter TypedSeqFeatureI. Using only Bio::SF::Annotated limits other more lightweight implementations. I simply haven't had time to work on it yet due to $job; if you want to make the necessary changes you are more than welcome; the few tests I found I moved into SeqFeatAnnotated.t, which likely expects the wrong behavior. > However, I'm obviously missing something, because I have no idea what > the justification for returning SimpleValue objects was in the first > place (what other module needs them?), nor even what the point of > SimpleValue objects is in the first place. I believe it was to ensure any data stored or retrieved was strongly typed (i.e. scalars in SimpleValue, dbxrefs in DBLink, comments in Comment, etc). Since B::SF::Generic is also AnnotatableI, it can store a mix of scalars in methods as well as Bio::Annotation data; this class attempts to lump them all together as Bio::Annotation in a Collection in a strongly typed, uniform way. Hilmar's Bio::SF::AnnotationAdaptor frankly does a better job of describing the reasoning behind this and is more flexible; I use that now for typing via feature(), though I just realized it should be changed to be a singleton instance per class (oops!). B::SF::A violated the SeqFeatureI interface from the get-go by returning objects. To trick it's way around the issue it used overloading so that calling it in some contexts (print, comparison) returned a string or value; removing the overloads unmasked that behavior. To me an object returned (regardless of overloading) is still an object and not a scalar, and still violates the interface methods where scalars are expected. I can't fault the authors involved completely since the idea was to radically change the way SFs/Annotation worked together, but the implementation was never completed so I rolled it back and limited typing to B::SF::A until something else can be worked out. Personally I think it's too 'heavy' and other options should be explored, such as abstracting out the type checking into a separate utility class which FeatureIO can use on any SeqFeatureI (TypeMapper does something like this for the primary_tag()). chris From jaudall at gmail.com Thu Oct 11 20:51:50 2007 From: jaudall at gmail.com (Joshua Udall) Date: Thu, 11 Oct 2007 18:51:50 -0600 Subject: [Bioperl-l] Hsp_hit-from Message-ID: <52cea20c0710111751n31a6c96ai309ff3b3714358cc@mail.gmail.com> Bioperl - I'm parsing an XML blast output using SearchIO and the usual fields seem to work fine. print $result->query_name . "\t" . $hit->name . "\n"; However, I'm curious as to the 5' position distribution of my hits. So I'd like to access this data field in the xml output file. I searched both the mailing lists and the Search docs including *Bio::Search::Hit::HitI, *but I can't seem to find an example of someone accessing this data field. Perhaps, I missed it, but do any of you have any suggestions as to how I might get at the for each hit? Thanks. Josh From cjfields at uiuc.edu Thu Oct 11 22:14:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 Oct 2007 21:14:21 -0500 Subject: [Bioperl-l] Hsp_hit-from In-Reply-To: <52cea20c0710111751n31a6c96ai309ff3b3714358cc@mail.gmail.com> References: <52cea20c0710111751n31a6c96ai309ff3b3714358cc@mail.gmail.com> Message-ID: <162E9AF9-92FC-41A1-ABEB-9AF7113CC6FD@uiuc.edu> On Oct 11, 2007, at 7:51 PM, Joshua Udall wrote: > Bioperl - > > I'm parsing an XML blast output using SearchIO and the usual fields > seem to > work fine. > > print $result->query_name . "\t" . $hit->name . "\n"; > > However, I'm curious as to the 5' position distribution of my > hits. So I'd > like to access this data field in the xml output > file. I > searched both the mailing lists and the Search docs including > *Bio::Search::Hit::HitI, > *but I can't seem to find an example of someone accessing this data > field. > Perhaps, I missed it, but do any of you have any suggestions as to > how I > might get at the for each hit? Thanks. > > Josh That tag (as well as related tags) are mapped to start(), end(), and strand() in the HSP objects. To access the relevant data pass 'query' or 'hit' to the method, so for the query use: $hsp->start ('query') or $hsp->end('query'). Note: in BioPerl all location starts are less than the end (strand determines the orientation for DNA). chris From hlapp at gmx.net Thu Oct 11 23:25:22 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 11 Oct 2007 23:25:22 -0400 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: <470DFD98.4030603@sendu.me.uk> References: <470DFD98.4030603@sendu.me.uk> Message-ID: On Oct 11, 2007, at 6:40 AM, Sendu Bala wrote: > seq_id() > Usage : $obj->seq_id($newval) > Function: holds a string corresponding to the unique > seq_id of the sequence underlying the feature > (e.g. database accession or primary key). > Returns : a Bio::Annotation::SimpleValue object representing the > seq_id. type() and source() should be ontology-typed in a TypedSeqFeatureI (and therefore be instances of Bio::Annotation::OntologyTerm, not Bio::Annotation::SimpleValue) to be more GFF3-compliant, but I don't understand what the benefit of typing seq_id() any stronger than a string would be. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jaudall at gmail.com Thu Oct 11 23:34:18 2007 From: jaudall at gmail.com (Joshua Udall) Date: Thu, 11 Oct 2007 20:34:18 -0700 Subject: [Bioperl-l] Hsp_hit-from In-Reply-To: <162E9AF9-92FC-41A1-ABEB-9AF7113CC6FD@uiuc.edu> References: <52cea20c0710111751n31a6c96ai309ff3b3714358cc@mail.gmail.com> <162E9AF9-92FC-41A1-ABEB-9AF7113CC6FD@uiuc.edu> Message-ID: <52cea20c0710112034k60f14241k8884a07d26939e86@mail.gmail.com> Thanks! For some reason I was stuck on the idea that it would be different with an xml file. I'm glad it's not ... On 10/11/07, Chris Fields wrote: > > > On Oct 11, 2007, at 7:51 PM, Joshua Udall wrote: > > > Bioperl - > > > > I'm parsing an XML blast output using SearchIO and the usual fields > > seem to > > work fine. > > > > print $result->query_name . "\t" . $hit->name . "\n"; > > > > However, I'm curious as to the 5' position distribution of my > > hits. So I'd > > like to access this data field in the xml output > > file. I > > searched both the mailing lists and the Search docs including > > *Bio::Search::Hit::HitI, > > *but I can't seem to find an example of someone accessing this data > > field. > > Perhaps, I missed it, but do any of you have any suggestions as to > > how I > > might get at the for each hit? Thanks. > > > > Josh > > That tag (as well as related tags) are mapped to start(), end(), and > strand() in the HSP objects. To access the relevant data pass > 'query' or 'hit' to the method, so for the query use: $hsp->start > ('query') or $hsp->end('query'). > > Note: in BioPerl all location starts are less than the end (strand > determines the orientation for DNA). > > chris > From pellet at cervi-lyon.inserm.fr Fri Oct 12 04:48:06 2007 From: pellet at cervi-lyon.inserm.fr (Johann PELLET) Date: Fri, 12 Oct 2007 10:48:06 +0200 Subject: [Bioperl-l] Get nucleic CDS sequence from Genbank files using spliced_seq Message-ID: <470F34C6.9050800@cervi-lyon.inserm.fr> Hi, I have two questions: First, I have problems for example with this genbank entry: NC_008210. Indeed, for one CDS, the location is : join(161990..162784,complement(88222..88806),complement(86666..87448)) When I parse a genbank file with this entry. my $seq_in = Bio::SeqIO->new( -format => 'genbank', -file => $input_file); while( my $seq = $seq_in->next_seq() ) { my @features = $seq->get_SeqFeatures(); for ( my $i =0; $i < scalar @features; $i++ ){ my $feat = @features[$i]; if ( $feat->primary_tag eq 'CDS' ){ my $seq_CDS_obj=$cds->spliced_seq( -nosort => 0); my $seq_CDS=$seq_CDS_obj->seq; I have this error: Can't call method "isa" without a package or object reference at .... Secondly, When location is like: join(complement(AY421753.1:1..6),complement(3813..5699)) I know that we must use spliced_seq with the argument db, but it's not working. my $seq_CDS_obj=$cds->spliced_seq( -db => "genbank"); How can we valid a Bio::DB::RandomAccessI? Thanks -- Johann Pellet phone (work): +33(0)4 37282352 E-mail: pellet at cervi-lyon.inserm.fr Centre d'Etudes et de Recherche en Virologie et Immunologie INSERM U503 21, Avenue Tony Garnier 69365 Lyon cedex 07 France From bix at sendu.me.uk Fri Oct 12 09:04:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 12 Oct 2007 14:04:25 +0100 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: References: <470DFD98.4030603@sendu.me.uk> Message-ID: <470F70D9.6030108@sendu.me.uk> Chris Fields wrote: > B::SF::A violated the SeqFeatureI interface from the get-go by returning > objects. On that note, the other major change I'd propose is to make B::SF::A inherit from B::SeqFeatureI. I really need this for RangeI methods. Or, heck, since it's such a mess, do we just want to drop Annotated entirely and recommend using AnnotationAdaptor with Generic? What does Annotated bring to the table? (Since its one of the modules added in 1.5 I think it's fair to drop it from the 1.6 release: stuff that doesn't work can and should be moved out of the path of stable branches.) From hlapp at gmx.net Fri Oct 12 10:46:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 12 Oct 2007 10:46:19 -0400 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: <470F70D9.6030108@sendu.me.uk> References: <470DFD98.4030603@sendu.me.uk> <470F70D9.6030108@sendu.me.uk> Message-ID: <81888A43-B6D5-44ED-B445-56520A142AC2@gmx.net> On Oct 12, 2007, at 9:04 AM, Sendu Bala wrote: > Or, heck, since it's such a mess, do we just want to drop Annotated > entirely and recommend using AnnotationAdaptor with Generic? What does > Annotated bring to the table? My recollection may be wrong since it's been quite a while, but I believe one of the main motivations was to reflect the ontology- typing of feature type and other annotation that GFF3 requires. The motivation of AnnotationAdaptor was primarily to provide a view onto a SeqFeatureI that makes tag/value annotation transparently appear as B:AnnotatableI and B:AnnotationCollectionI compliant, mixed in with the annotation collection that B:S:Generic may hold already. Although sometimes confused, these are not the same motivations. AnnotationAdaptor provides a view - you use it when that's the view you need on a SeqFeatureI object you have in hand. For example, bioperl-db uses it so it doesn't have to bother about how to de/ serialize tag/value pairs, when it knows how to de/serialize annotation collections already from sequence objects. AnnotationAdaptor doesn't care whether the primary_tag, or any other tag, is from an ontology. > > (Since its one of the modules added in 1.5 I think it's fair to > drop it >> from the 1.6 release: stuff that doesn't work can and should be moved > out of the path of stable branches.) I agree with that unless someone steps up and makes it work. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Fri Oct 12 10:49:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 Oct 2007 09:49:25 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: <470F70D9.6030108@sendu.me.uk> References: <470DFD98.4030603@sendu.me.uk> <470F70D9.6030108@sendu.me.uk> Message-ID: On Oct 12, 2007, at 8:04 AM, Sendu Bala wrote: > Chris Fields wrote: >> B::SF::A violated the SeqFeatureI interface from the get-go by >> returning >> objects. > > On that note, the other major change I'd propose is to make B::SF::A > inherit from B::SeqFeatureI. I really need this for RangeI methods. It should already be SeqFeatureI; B::SF::A is-a TypedSeqFeatureI, which itself is-a SeqFeatureI. We might want to run the tests on that but a quick inheritance tree check on my ends seems to confirm that. If the Range methods don't work there may be an issue within B::SF::A (which wouldn't surprise me). > Or, heck, since it's such a mess, do we just want to drop Annotated > entirely and recommend using AnnotationAdaptor with Generic? What does > Annotated bring to the table? I agree about getting rid of B::SF::A, but if we do we will need a reasonable replacement for SFs in FeatureIO, as it relies directly on B::SF::A. As for AnnotationAdaptor, it runs a simple type check on the data within a SF but I don't think it checks the primary_tag against the current SO or other ontologies for GFF3. My inclination is to hold back FeatureIO from 1.6 and retool it to use any SeqFeatureI, then find a way to optionally type a SeqFeatureI. This is so we don't have to completely retool every Bio::Tools* class. For instance: # a SF::Generic with a few type-related methods added to SeqFeatureI $sf->type(); # undef, not checked $sf->ontology_term; # undef, not checked # call proper methods to type check # or use a utility class dedicated to this function $sf->validate_sf_type(); # or... $util->validate_sf_type($sf); $sf->type(); # returns string $sf->ontology_term; # returns Bio::Ontology::TermI My feeling is that FeatureIO needs the ability to validate SFs and maybe have it turned on by default, but it should be optional. The TermI would be a Bio::Annotation::OntologyTerm if we insist on using AnnotationAdaptor for richly typed SFs, which may not be necessary (Hilmar?). As not all SeqFeatureI are also AnnotatableI that should probably be optional anyway. > (Since its one of the modules added in 1.5 I think it's fair to > drop it > from the 1.6 release: stuff that doesn't work can and should be moved > out of the path of stable branches.) Agreed. If anyone wants to use experimental stuff they can always use bioperl-live. chris From cjfields at uiuc.edu Fri Oct 12 11:02:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 Oct 2007 10:02:03 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: <81888A43-B6D5-44ED-B445-56520A142AC2@gmx.net> References: <470DFD98.4030603@sendu.me.uk> <470F70D9.6030108@sendu.me.uk> <81888A43-B6D5-44ED-B445-56520A142AC2@gmx.net> Message-ID: On Oct 12, 2007, at 9:46 AM, Hilmar Lapp wrote: > > On Oct 12, 2007, at 9:04 AM, Sendu Bala wrote: > >> Or, heck, since it's such a mess, do we just want to drop Annotated >> entirely and recommend using AnnotationAdaptor with Generic? What >> does >> Annotated bring to the table? > > My recollection may be wrong since it's been quite a while, but I > believe one of the main motivations was to reflect the ontology- > typing of feature type and other annotation that GFF3 requires. > > The motivation of AnnotationAdaptor was primarily to provide a view > onto a SeqFeatureI that makes tag/value annotation transparently > appear as B:AnnotatableI and B:AnnotationCollectionI compliant, > mixed in with the annotation collection that B:S:Generic may hold > already. > > Although sometimes confused, these are not the same motivations. > AnnotationAdaptor provides a view - you use it when that's the view > you need on a SeqFeatureI object you have in hand. For example, > bioperl-db uses it so it doesn't have to bother about how to de/ > serialize tag/value pairs, when it knows how to de/serialize > annotation collections already from sequence objects. > AnnotationAdaptor doesn't care whether the primary_tag, or any > other tag, is from an ontology. Ah, forgot it returns a Collection. It doesn't modify the SF I assume? >> (Since its one of the modules added in 1.5 I think it's fair to >> drop it >>> from the 1.6 release: stuff that doesn't work can and should be >>> moved >> out of the path of stable branches.) > > I agree with that unless someone steps up and makes it work. > > -hilmar Making it work may be more trouble than it's worth. Sincerest apologies to the authors of the module but we should probably deprecate it in favor of something more flexible when we can. It's obvious anything reliant on it would also have to be dropped. The GMOD meeting is a few weeks away, where I assume some things will be hammered out re: GFF3 and BioPerl. The idea of FeatureIO is worth saving; maybe retooling it would be in our best interests, but it needs a decent roadmap to move forward. chris From bix at sendu.me.uk Fri Oct 12 11:06:30 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 12 Oct 2007 16:06:30 +0100 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: References: <470DFD98.4030603@sendu.me.uk> <470F70D9.6030108@sendu.me.uk> Message-ID: <470F8D76.8080101@sendu.me.uk> Chris Fields wrote: > > On Oct 12, 2007, at 8:04 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> B::SF::A violated the SeqFeatureI interface from the get-go by >>> returning objects. >> >> On that note, the other major change I'd propose is to make >> B::SF::A inherit from B::SeqFeatureI. I really need this for RangeI >> methods. > > It should already be SeqFeatureI; B::SF::A is-a TypedSeqFeatureI, > which itself is-a SeqFeatureI. We might want to run the tests on > that but a quick inheritance tree check on my ends seems to confirm > that. If the Range methods don't work there may be an issue within > B::SF::A (which wouldn't surprise me). Sorry, I was looking at the docs on the website and for whatever reason its failing to display all the inherited modules. If RangeI methods don't work I'll post back, but for now assume they do (I'm not in a position to test right now). > My inclination is [snip] I still don't really understand what the deal is with all these modules, so I'll leave it your hands to make further recommendations and take action. From my limited understanding what you said seems reasonable, except for excluding FeatureIO completely. At least, I'd really like to see it in 1.6; it would be a shame to miss it just because of Annotated. Worst case, isn't Generic 'good enough' as a replacement? > The GMOD meeting is a few weeks away, where I assume some things will > be hammered out re: GFF3 and BioPerl. The idea of FeatureIO is worth > saving; maybe retooling it would be in our best interests, but it > needs a decent roadmap to move forward. Agreed. I guess we'll wait for the results of that meeting? Is anyone going planning to discuss this issue? From cjfields at uiuc.edu Fri Oct 12 12:15:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 Oct 2007 11:15:51 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: <470F8D76.8080101@sendu.me.uk> References: <470DFD98.4030603@sendu.me.uk> <470F70D9.6030108@sendu.me.uk> <470F8D76.8080101@sendu.me.uk> Message-ID: On Oct 12, 2007, at 10:06 AM, Sendu Bala wrote: ... > > I still don't really understand what the deal is with all these > modules, > so I'll leave it your hands to make further recommendations and > take action. > > From my limited understanding what you said seems reasonable, > except for > excluding FeatureIO completely. At least, I'd really like to see it in > 1.6; it would be a shame to miss it just because of Annotated. Worst > case, isn't Generic 'good enough' as a replacement? It probably could work, but type checking, ontology_term(), and SF data validation wouldn't be implemented immediately as it would require additional code. I'll try looking into it this weekend to see what needs to be done and work from there. >> The GMOD meeting is a few weeks away, where I assume some things will >> be hammered out re: GFF3 and BioPerl. The idea of FeatureIO is worth >> saving; maybe retooling it would be in our best interests, but it >> needs a decent roadmap to move forward. > > Agreed. I guess we'll wait for the results of that meeting? Is > anyone going planning to discuss this issue? Jason is going to be there and intends to bring it up as it affects GBrowse/GMOD/Chado. chris From jason at bioperl.org Fri Oct 12 12:18:15 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 12 Oct 2007 09:18:15 -0700 Subject: [Bioperl-l] Get nucleic CDS sequence from Genbank files using spliced_seq In-Reply-To: <470F34C6.9050800@cervi-lyon.inserm.fr> References: <470F34C6.9050800@cervi-lyon.inserm.fr> Message-ID: <90E849A1-C5AB-40B3-9AB1-2DC79EBF5C4C@bioperl.org> 1. Always 'use strict;' in your code - you'll see the bug more clearly that you should use $feat not $cds in the if stmt. 2. You should pass in an actual Bio::DB::GenBank object to the -db argument. my $db = Bio::DB::GenBank->new; my $cds = $seq->spliced_seq(-db => $db); -jason On Oct 12, 2007, at 1:48 AM, Johann PELLET wrote: > Hi, > I have two questions: > First, I have problems for example with this genbank entry: NC_008210. > Indeed, for one CDS, the location is : > join(161990..162784,complement(88222..88806),complement(86666..87448)) > > When I parse a genbank file with this entry. > > my $seq_in = Bio::SeqIO->new( -format => 'genbank', > -file => $input_file); > > while( my $seq = $seq_in->next_seq() ) { > my @features = $seq->get_SeqFeatures(); > for ( my $i =0; $i < scalar @features; $i++ ){ > my $feat = @features[$i]; > if ( $feat->primary_tag eq 'CDS' ){ > > > my $seq_CDS_obj=$cds->spliced_seq( -nosort => 0); > my $seq_CDS=$seq_CDS_obj->seq; > > > I have this error: > Can't call method "isa" without a package or object reference at .... > > Secondly, When location is like: > join(complement(AY421753.1:1..6),complement(3813..5699)) I know > that we > must use spliced_seq with the argument db, but it's not working. > my $seq_CDS_obj=$cds->spliced_seq( -db => "genbank"); > How can we valid a Bio::DB::RandomAccessI? > > Thanks > > -- > Johann Pellet > phone (work): +33(0)4 37282352 > E-mail: pellet at cervi-lyon.inserm.fr > Centre d'Etudes et de Recherche en Virologie et Immunologie > INSERM U503 > 21, Avenue Tony Garnier > 69365 Lyon cedex 07 France > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From hlapp at gmx.net Fri Oct 12 13:22:00 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 12 Oct 2007 13:22:00 -0400 Subject: [Bioperl-l] Bio::SeqFeature::Annotated API change In-Reply-To: References: <470DFD98.4030603@sendu.me.uk> <470F70D9.6030108@sendu.me.uk> <81888A43-B6D5-44ED-B445-56520A142AC2@gmx.net> Message-ID: <9762F1F9-68C8-42E2-8488-6656C76D36B8@gmx.net> On Oct 12, 2007, at 11:02 AM, Chris Fields wrote: >> Although sometimes confused, these are not the same motivations. >> AnnotationAdaptor provides a view - you use it when that's the >> view you need on a SeqFeatureI object you have in hand. For >> example, bioperl-db uses it so it doesn't have to bother about how >> to de/serialize tag/value pairs, when it knows how to de/serialize >> annotation collections already from sequence objects. >> AnnotationAdaptor doesn't care whether the primary_tag, or any >> other tag, is from an ontology. > > Ah, forgot it returns a Collection. It doesn't modify the SF I > assume? No, not if you only use getters. It does support adding annotation too, though. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From smwilson at hpc.unm.edu Fri Oct 12 13:45:41 2007 From: smwilson at hpc.unm.edu (Susan Wilson) Date: Fri, 12 Oct 2007 11:45:41 -0600 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGene against latest Homo_sapiens.ags file Message-ID: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> Hi, I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on it to generate Homo_sapiens.xml which is 5821420628 bytes. I cannot parse this file with Bio::ASN1::EntrezGene, even on a machine with 256GB of memory. I get a simple "Out of memory" output even with the following code: #!/usr/bin/perl use strict; use Bio::ASN1::EntrezGene; my $parser = Bio::ASN1::EntrezGene->new('file' => "Homo_sapiens.xml"); while(my $result = $parser->next_seq) { } Thanks. Susan From arareko at campus.iztacala.unam.mx Fri Oct 12 14:10:44 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 12 Oct 2007 14:10:44 -0400 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGene against latest Homo_sapiens.ags file In-Reply-To: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> Message-ID: <470FB8A4.2050601@campus.iztacala.unam.mx> Hi Susan, Bio::ASN1::EntrezGene is not part of the BioPerl distribution, even though it shares the same namespace and is used by BP. Maybe you'd want to contact Mingyi Liu who is the author of the module. for more info take a look here: http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene/ Regards, Mauricio. Susan Wilson wrote: > Hi, > > I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ > ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on it to > generate Homo_sapiens.xml which is 5821420628 bytes. I cannot parse > this file with Bio::ASN1::EntrezGene, even on a machine with 256GB of > memory. I get a simple "Out of memory" output even with the > following code: > > #!/usr/bin/perl > use strict; > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => > "Homo_sapiens.xml"); > while(my $result = $parser->next_seq) > { > } > > > > Thanks. > Susan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From Kevin.M.Brown at asu.edu Fri Oct 12 14:19:48 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 12 Oct 2007 11:19:48 -0700 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file In-Reply-To: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> Message-ID: <1A4207F8295607498283FE9E93B775B403D39379@EX02.asurite.ad.asu.edu> > I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ > ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on > it to generate Homo_sapiens.xml which is 5821420628 bytes. I > cannot parse this file with Bio::ASN1::EntrezGene, even on a > machine with 256GB of memory. I get a simple "Out of memory" > output even with the following code: > > #!/usr/bin/perl > use strict; > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => > "Homo_sapiens.xml"); > while(my $result = $parser->next_seq) > { > } I think most systems have a per process memory limit (either hardcoded in the OS or configured depending on the OS) and IIRC most of the IO handlers for BioPerl load entire file contents into memory to process them. Some of the IO parsers have been changed recently (a new one added for blast) so that it only pulls into memory as much as it needs to process the next result rather than the whole file in one shebang. From stefan.kirov at bms.com Fri Oct 12 14:34:49 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 12 Oct 2007 14:34:49 -0400 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file In-Reply-To: <1A4207F8295607498283FE9E93B775B403D39379@EX02.asurite.ad.asu.edu> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> <1A4207F8295607498283FE9E93B775B403D39379@EX02.asurite.ad.asu.edu> Message-ID: <470FBE49.7010703@bms.com> Kevin Brown wrote: >> I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ >> ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on >> it to generate Homo_sapiens.xml which is 5821420628 bytes. I >> cannot parse this file with Bio::ASN1::EntrezGene, even on a >> machine with 256GB of memory. I get a simple "Out of memory" >> output even with the following code: >> >> #!/usr/bin/perl >> use strict; >> use Bio::ASN1::EntrezGene; >> my $parser = Bio::ASN1::EntrezGene->new('file' => >> "Homo_sapiens.xml"); >> while(my $result = $parser->next_seq) >> { >> } >> > > I think most systems have a per process memory limit (either hardcoded > in the OS or configured depending on the OS) and IIRC most of the IO > handlers for BioPerl load entire file contents into memory to process > them. Some of the IO parsers have been changed recently (a new one > added for blast) so that it only pulls into memory as much as it needs > to process the next result rather than the whole file in one shebang. > The file is approx. 6GB, so on a 256GB machine this is not going to create any problem. I think this might be deep not well controlled recursion problem. Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mingyi.liu at gpc-biotech.com Fri Oct 12 15:06:25 2007 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Fri, 12 Oct 2007 15:06:25 -0400 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file In-Reply-To: <470FC1E0.4070708@gpc-biotech.com> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> <470FC1E0.4070708@gpc-biotech.com> Message-ID: <470FC5B1.3090606@gpc-biotech.com> BTW, here's the syntax in one of my messages last year about how to convert the compressed binary ASN format NCBI provides to the text ASN format my module (or Stefan's SeqIO::entrezgene) expects (the -x switch does the trick, overwriting the default option to produce XML output): my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). BTW, text ASN is both smaller and faster to parse than XML format. Best, Mingyi From j_martin at lbl.gov Fri Oct 12 14:58:41 2007 From: j_martin at lbl.gov (Joel Martin) Date: Fri, 12 Oct 2007 11:58:41 -0700 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file In-Reply-To: <470FBE49.7010703@bms.com> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> <1A4207F8295607498283FE9E93B775B403D39379@EX02.asurite.ad.asu.edu> <470FBE49.7010703@bms.com> Message-ID: <20071012185841.GE13838@eniac.jgi-psf.org> Hello, Just a suggestion, is /usr/bin/perl a 64 bit perl? Even on our sun machines with 72+ GB memory, for some reason they're distributed with a 32 bit perl which can handle large files but would probably have out of memory errrors if trying to read one into memory. % file /usr/bin/perl /usr/bin/perl: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped Joel On Fri, Oct 12, 2007 at 02:34:49PM -0400, Stefan Kirov wrote: > Kevin Brown wrote: > >> I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ > >> ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on > >> it to generate Homo_sapiens.xml which is 5821420628 bytes. I > >> cannot parse this file with Bio::ASN1::EntrezGene, even on a > >> machine with 256GB of memory. I get a simple "Out of memory" > >> output even with the following code: > >> > >> #!/usr/bin/perl > >> use strict; > >> use Bio::ASN1::EntrezGene; > >> my $parser = Bio::ASN1::EntrezGene->new('file' => > >> "Homo_sapiens.xml"); > >> while(my $result = $parser->next_seq) > >> { > >> } > >> > > > > I think most systems have a per process memory limit (either hardcoded > > in the OS or configured depending on the OS) and IIRC most of the IO > > handlers for BioPerl load entire file contents into memory to process > > them. Some of the IO parsers have been changed recently (a new one > > added for blast) so that it only pulls into memory as much as it needs > > to process the next result rather than the whole file in one shebang. > > > The file is approx. 6GB, so on a 256GB machine this is not going to > create any problem. I think this might be deep not well controlled > recursion problem. > Stefan > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stefan.kirov at bms.com Fri Oct 12 14:20:38 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 12 Oct 2007 14:20:38 -0400 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGene against latest Homo_sapiens.ags file In-Reply-To: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> Message-ID: <470FBAF6.1050509@bms.com> Susan Wilson wrote: > Hi, > > I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ > ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on it to > generate Homo_sapiens.xml which is 5821420628 bytes. I cannot parse > this file with Bio::ASN1::EntrezGene, even on a machine with 256GB of > memory. I get a simple "Out of memory" output even with the > following code: > > #!/usr/bin/perl > use strict; > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => > "Homo_sapiens.xml"); > while(my $result = $parser->next_seq) > { > } > > > > Thanks. > Susan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Susan, Are you running the latest version of Bio::ASN1::EntrezGene? You may have a better chance of getting a fast (and useful) answer if you contact Mingyi Liu (see on CPAN) directly- the module is not part of Bioperl. Just to mention- I have also seen similar problems and there seems to be particular problematic records. I think NCBI made some changes/additions to their format (might have to do something with the number/structure of contigs). I will have to run my pipeline soon again and if I run into the same problem I will probably create bug report for Mingyi. I hope you do it before me- it is boring and long process. Stefan From mingyi.liu at gpc-biotech.com Fri Oct 12 14:50:08 2007 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Fri, 12 Oct 2007 14:50:08 -0400 Subject: [Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file In-Reply-To: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> References: <7F4D241A-A3EF-44BA-9DAE-CE51E7449F5C@hpc.unm.edu> Message-ID: <470FC1E0.4070708@gpc-biotech.com> Hi, Susan, Mauricio is right. When there's a problem with Bio::ASN1::EntrezGene, it's better to directly contact me. I actually deleted a few messages of this discussion before one caught my eye. Nowadays I'm working in some other areas and not tracking bioperl mailing list closely, a direct email to me would usually work out better. As for the problem you mentioned, there could be two reasons: 1. It seems that you converted the file to XML file instead of ASN file. My parser is designed for ASN file, so please use gene2xml to convert the downloaded file to ASN file instead of XML file. It is likely the wrong syntax of the file caused my parser to attempt to read the entire file as a string (because it couldn't find the start/end). However, there's another minor possibility (which you might have taken care of already): 2. Perl 5.8 added 64 bit support, but I don't know if you have perl 5.8 64 bit installed on your system to support the 256 GB system memory you have? If not, your >5 GB file is over the 4 GB 32 bit Perl limit. Let me know if my suggestions work out for you. Best, Mingyi Susan Wilson wrote: > Hi, > > I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ > ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on it to > generate Homo_sapiens.xml which is 5821420628 bytes. I cannot parse > this file with Bio::ASN1::EntrezGene, even on a machine with 256GB of > memory. I get a simple "Out of memory" output even with the > following code: > > #!/usr/bin/perl > use strict; > use Bio::ASN1::EntrezGene; > my $parser = Bio::ASN1::EntrezGene->new('file' => > "Homo_sapiens.xml"); > while(my $result = $parser->next_seq) > { > } > > > > Thanks. > Susan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gyang at plantbio.uga.edu Fri Oct 12 16:52:06 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Fri, 12 Oct 2007 16:52:06 -0400 Subject: [Bioperl-l] hide XML display on web In-Reply-To: 41A08079-6EEC-4B62-8104-C41E70C03083@uiuc.edu Message-ID: <20071012205206.9105fd09@dogwood.plantbio.uga.edu> Hi, All, I have a cgi script that uses remoteblast (xml output). Every time, before and after the blast is finished, a lot of information is displayed on the client's screen, it's a nuisance but I am wondering if anyone knows how to rid of these. See below for examples: Thanks Guojun Yang _______________________example_____________________ NCBI Blast: