From outaleb at web.de Mon Oct 1 07:46:08 2007 From: outaleb at web.de (issam outaleb) Date: Mon, 01 Oct 2007 13:46:08 +0200 Subject: [Bioperl-l] help about Fasta file??? Message-ID: <2005673060@web.de> hallo all, i have a little problem,: im using this programm, i got some experiment and get some results--> IPI hits,(IPI Accnum) what i want is how can i correlate this IPI ACC Numbers with the FASTA FILE (database fasta),so the programm has to look where is the IPI Accnum in the db and copy this include description and Sequence to a new file; all #!/usr/bin/perl#use warning;#use strict;use CGI qw(:all);open (IN,"C:/Documents and Settings/XXX/Desktop/Search_file") or die "Fehler beim oeffnen";open (FASTA_db,"C:/Documents and Settings/XXX/Desktop/FASTA1.fasta") or die "FASTA nicht m?glich zum ?ffnen!!" ;open (OUT,">C:/Documents and Settings/XXX/Desktop/reslut.txt") or die "Fehler beim Anlegen der neuen Datei";#print "\nDateien zum kopieren geoeffnet\n";while (){ $i = $_; chomp $i; if(/Hit\d">([^<\/A> ]*)/)#match string from htm datei,give me such result-->IOP123234(just IPIs) { #print OUT $1."\n"; #print this IPIs in this file. #what i thought was to push up this IPIs in the array than look at them in the fasta_db file and copy it to new file with the description #and Sequence also. so generate a new fasta file include just my IPIs results. how??? $j = $1; push(@array,$j); } } while (defined($var=)){ $var =~ /(>IPI:)([^| .]*)([^>]*)/ ;# }} close (IN);close (FASTA_db);close (OUT);print "\nDateien geschlossen, Kopiervorgang . Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate kostenlos testen. *http://www.pc-sicherheit.web.de/startseite/?mc=022220* [http://www.pc-sicherheit.web.de/startseite/?mc=022220] From shameer at ncbs.res.in Mon Oct 1 12:57:15 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 1 Oct 2007 22:27:15 +0530 (IST) Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? Message-ID: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> Dear All, Is it possible to draw a phylogeny tree file in PNG format using Bioperl ? My input file are in phylip treefile format. Any Modules / codes in Bio::Graphics / Phylogeny sections ? Input file : ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348:3. 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__5 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From luciap at sas.upenn.edu Mon Oct 1 14:03:00 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Mon, 01 Oct 2007 14:03:00 -0400 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> Message-ID: <1191261780.47013654cb81b@webmail.sas.upenn.edu> I think you'll have better luck using some of already available programs to do that, you'll get better looking trees. If you just have one tree to draw I recommend you use: http://itol.embl.de/ Lucia Quoting Shameer Khadar : > Dear All, > > Is it possible to draw a phylogeny tree file in PNG format using Bioperl ? > My input file are in phylip treefile format. Any Modules / codes in > Bio::Graphics / Phylogeny sections ? > > Input file : > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348:3. > 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__5 > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); > > -- > Shameer Khadar > Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From shameer at ncbs.res.in Mon Oct 1 14:39:05 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 2 Oct 2007 00:09:05 +0530 (IST) Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <1191263834.47013e5a6af93@webmail.sas.upenn.edu> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> <1191261780.47013654cb81b@webmail.sas.upenn.edu> <48581.192.168.1.1.1191262243.squirrel@mail.ncbs.res.in> <1191263834.47013e5a6af93@webmail.sas.upenn.edu> Message-ID: <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> Dear Lucia, Thanks for the mail. Now I got it. I didnt used this TreeIO / Tree::Draw methods. Some how missed this excellent HOWTO : http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as well. I tried that and it worked very nicely. I have to work around to beautify the tree and I am just going to do that. Thanks & Cheers, Shameer > OK > > you can use the implementations in Bio::TreeIO > > you can basically read the tree in newick format and out as an svg graph > something like this: > > my $in = new Bio::TreeIO(-file => 'input', > -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>mytree.svg', > -format => 'svggraph'); > while( my $tree = $in->next_tree ) { > $out->write_tree($tree); > } > > you can also use Bio::Tree::Draw > > hope that helps > > Lucia > > > Quoting Shameer Khadar : > >> Hi, >> >> Thanks for your mail. I have to create these trees as a part of a >> webserver. i need to generate them dynamically using users input >> sequence. >> I think ITOL is not the stuff best suited for my purpose. >> >> > I think you'll have better luck using some of already available >> programs >> > to do >> > that, you'll get better looking trees. If you just have one tree to >> draw I >> > recommend you use: >> > http://itol.embl.de/ >> > >> > Lucia >> > >> > >> > Quoting Shameer Khadar : >> > >> >> Dear All, >> >> >> >> Is it possible to draw a phylogeny tree file in PNG format using >> Bioperl >> >> ? >> >> My input file are in phylip treefile format. Any Modules / codes in >> >> Bio::Graphics / Phylogeny sections ? >> >> >> >> Input file : >> >> >> > >> > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348: >> 3. >> >> >> > >> > 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__ >> 5 >> >> >> > >> > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); >> >> >> >> -- >> Shameer Khadar >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group >> National Centre for Biological Sciences (TIFR) >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India >> T - 91-080-23666001 EXT - 6251 >> W - http://www.ncbs.res.in >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From luciap at sas.upenn.edu Mon Oct 1 14:48:51 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Mon, 01 Oct 2007 14:48:51 -0400 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> <1191261780.47013654cb81b@webmail.sas.upenn.edu> <48581.192.168.1.1.1191262243.squirrel@mail.ncbs.res.in> <1191263834.47013e5a6af93@webmail.sas.upenn.edu> <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> Message-ID: <1191264531.47014113f2b54@webmail.sas.upenn.edu> Yes, that's the issue about those commands, trees are not pretty at all that's why for a one tree only kind of thing I rather use ITOL other thing to try is the tree drawer of the Mesquite package glad I could help Lucia Quoting Shameer Khadar : > Dear Lucia, > > Thanks for the mail. Now I got it. I didnt used this TreeIO / Tree::Draw > methods. Some how missed this excellent HOWTO : > http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as well. I > tried that and it worked very nicely. I have to work around to beautify > the tree and I am just going to do that. > > Thanks & Cheers, > Shameer > > > OK > > > > you can use the implementations in Bio::TreeIO > > > > you can basically read the tree in newick format and out as an svg graph > > something like this: > > > > my $in = new Bio::TreeIO(-file => 'input', > > -format => 'newick'); > > my $out = new Bio::TreeIO(-file => '>mytree.svg', > > -format => 'svggraph'); > > while( my $tree = $in->next_tree ) { > > $out->write_tree($tree); > > } > > > > you can also use Bio::Tree::Draw > > > > hope that helps > > > > Lucia > > > > > > Quoting Shameer Khadar : > > > >> Hi, > >> > >> Thanks for your mail. I have to create these trees as a part of a > >> webserver. i need to generate them dynamically using users input > >> sequence. > >> I think ITOL is not the stuff best suited for my purpose. > >> > >> > I think you'll have better luck using some of already available > >> programs > >> > to do > >> > that, you'll get better looking trees. If you just have one tree to > >> draw I > >> > recommend you use: > >> > http://itol.embl.de/ > >> > > >> > Lucia > >> > > >> > > >> > Quoting Shameer Khadar : > >> > > >> >> Dear All, > >> >> > >> >> Is it possible to draw a phylogeny tree file in PNG format using > >> Bioperl > >> >> ? > >> >> My input file are in phylip treefile format. Any Modules / codes in > >> >> Bio::Graphics / Phylogeny sections ? > >> >> > >> >> Input file : > >> >> > >> > > >> > > > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461,(((((((_E_499_500:2.75000,_E_250_251:2.75000):0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381):1.51514,_H_255_263:6.20895):0.83877,(_L_246_249:4.30000,_H_244_245:4.30000):2.74772):0.92645,_H_520_534:7.97418):0.15279,(_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765):1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264,((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000):0.50000,_L_445_447:7.75000):0.50300):1.08808,(((((_E_381_382:2.65000,_E_377_378:2.65000):0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063):1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696):0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000):1.52226,_H_463_464:5.42226):0.94093,(_E_427_435:5.15000,_E_369_372:5.15000):1.21319):1.64489,_L_336_343:8.00808):0.88402,(((_H_355_365:6.20000,_L_349_354:6.20000):0.91541,(_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082,(((((_E_470_474:3.85000,_E_344_348: > >> 3. > >> >> > >> > > >> > > > 85000):0.89054,_L_475_478:4.74054):1.20107,(_E_329_335:3.85000,_E_315_317:3.85000):2.09161):0.71112,_L_513_519:6.65273):0.67204,((_L_296_304:5.00000,_H_292_295:5.00000):0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814):0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000):1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790,(_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784):4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000):1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266):1.08193,_H_229_243:7.93459):0.18730,(_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298,((((_H_159_171:7.00000,_L_156_158:7.00000):0.07448,_L_120_122:7.07448):1.59389,((((_L__90__91:2.65000,_E__88__89:2.65000):0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061):1.74719,_L_147_155:6.52780):2.14057):2.44189,((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544,(_L_172_178:3.95000,_L__77__79:3.95000):1.70544):0.42200,_E_138_146:6.07744):0.46209,_E__51__ > >> 5 > >> >> > >> > > >> > > > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805,(_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197,((((_L_108_109:4.30000,_E_104_107:4.30000):1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044,(_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371,((_L__38__39:3.40000,_L__29__37:3.40000):3.64775,(((((_H___3___6:3.30000,_L___1___2:3.30000):0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856):0.88488,_E__40__48:5.79344):0.60814,(_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172):1.25080):0.17461):6.94325); > >> >> > >> > >> -- > >> Shameer Khadar > >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > >> National Centre for Biological Sciences (TIFR) > >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > >> T - 91-080-23666001 EXT - 6251 > >> W - http://www.ncbs.res.in > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > > > > -- > Shameer Khadar > Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason at bioperl.org Mon Oct 1 15:32:38 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 1 Oct 2007 12:32:38 -0700 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? In-Reply-To: <1191264531.47014113f2b54@webmail.sas.upenn.edu> References: <53150.192.168.1.1.1191257835.squirrel@mail.ncbs.res.in> <1191261780.47013654cb81b@webmail.sas.upenn.edu> <48581.192.168.1.1.1191262243.squirrel@mail.ncbs.res.in> <1191263834.47013e5a6af93@webmail.sas.upenn.edu> <58283.192.168.1.1.1191263945.squirrel@mail.ncbs.res.in> <1191264531.47014113f2b54@webmail.sas.upenn.edu> Message-ID: I'd definitely recommend Bio::Tree::Draw::Cladogram over svggraph for prettier trees - you get postscript out but you can render this to png or jpg with unix tools. If there is a better stand alone tree drawing engine we're happy to try and integrate it into bioperl - the modules here are native Perl only and you can use the bioperl-run modules that wrap DrawTree and DrawGram from EMBOSS to get other PS rendering output. Mesquite, TreeView or other tools are usually much better but not always an option if you want to auto-render these images for a website, etc. -jason On Oct 1, 2007, at 11:48 AM, Lucia Peixoto wrote: > Yes, > that's the issue about those commands, trees are not pretty at all > that's why for a one tree only kind of thing I rather use ITOL > other thing to try is the tree drawer of the Mesquite package > glad I could help > > Lucia > > Quoting Shameer Khadar : > >> Dear Lucia, >> >> Thanks for the mail. Now I got it. I didnt used this TreeIO / >> Tree::Draw >> methods. Some how missed this excellent HOWTO : >> http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as >> well. I >> tried that and it worked very nicely. I have to work around to >> beautify >> the tree and I am just going to do that. >> >> Thanks & Cheers, >> Shameer >> >>> OK >>> >>> you can use the implementations in Bio::TreeIO >>> >>> you can basically read the tree in newick format and out as an >>> svg graph >>> something like this: >>> >>> my $in = new Bio::TreeIO(-file => 'input', >>> -format => 'newick'); >>> my $out = new Bio::TreeIO(-file => '>mytree.svg', >>> -format => 'svggraph'); >>> while( my $tree = $in->next_tree ) { >>> $out->write_tree($tree); >>> } >>> >>> you can also use Bio::Tree::Draw >>> >>> hope that helps >>> >>> Lucia >>> >>> >>> Quoting Shameer Khadar : >>> >>>> Hi, >>>> >>>> Thanks for your mail. I have to create these trees as a part of a >>>> webserver. i need to generate them dynamically using users input >>>> sequence. >>>> I think ITOL is not the stuff best suited for my purpose. >>>> >>>>> I think you'll have better luck using some of already available >>>> programs >>>>> to do >>>>> that, you'll get better looking trees. If you just have one >>>>> tree to >>>> draw I >>>>> recommend you use: >>>>> http://itol.embl.de/ >>>>> >>>>> Lucia >>>>> >>>>> >>>>> Quoting Shameer Khadar : >>>>> >>>>>> Dear All, >>>>>> >>>>>> Is it possible to draw a phylogeny tree file in PNG format using >>>> Bioperl >>>>>> ? >>>>>> My input file are in phylip treefile format. Any Modules / >>>>>> codes in >>>>>> Bio::Graphics / Phylogeny sections ? >>>>>> >>>>>> Input file : >>>>>> >>>>> >>>> >>> >> > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461, > (((((((_E_499_500:2.75000,_E_250_251:2.75000): > 0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381): > 1.51514,_H_255_263:6.20895):0.83877, > (_L_246_249:4.30000,_H_244_245:4.30000):2.74772): > 0.92645,_H_520_534:7.97418):0.15279, > (_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765): > 1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264, > ((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000): > 0.50000,_L_445_447:7.75000):0.50300):1.08808, > (((((_E_381_382:2.65000,_E_377_378:2.65000): > 0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063): > 1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696): > 0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000): > 1.52226,_H_463_464:5.42226):0.94093, > (_E_427_435:5.15000,_E_369_372:5.15000):1.21319): > 1.64489,_L_336_343:8.00808):0.88402, > (((_H_355_365:6.20000,_L_349_354:6.20000):0.91541, > (_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082, > (((((_E_470_474:3.85000,_E_344_348: >>>> 3. >>>>>> >>>>> >>>> >>> >> > 85000):0.89054,_L_475_478:4.74054):1.20107, > (_E_329_335:3.85000,_E_315_317:3.85000):2.09161): > 0.71112,_L_513_519:6.65273):0.67204, > ((_L_296_304:5.00000,_H_292_295:5.00000): > 0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814): > 0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000): > 1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790, > (_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784): > 4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000): > 1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266): > 1.08193,_H_229_243:7.93459):0.18730, > (_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298, > ((((_H_159_171:7.00000,_L_156_158:7.00000): > 0.07448,_L_120_122:7.07448):1.59389, > ((((_L__90__91:2.65000,_E__88__89:2.65000): > 0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061): > 1.74719,_L_147_155:6.52780):2.14057):2.44189, > ((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544, > (_L_172_178:3.95000,_L__77__79:3.95000):1.70544): > 0.42200,_E_138_146:6.07744):0.46209,_E__51__ >>>> 5 >>>>>> >>>>> >>>> >>> >> > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805, > (_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197, > ((((_L_108_109:4.30000,_E_104_107:4.30000): > 1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044, > (_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371, > ((_L__38__39:3.40000,_L__29__37:3.40000):3.64775, > (((((_H___3___6:3.30000,_L___1___2:3.30000): > 0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856): > 0.88488,_E__40__48:5.79344):0.60814, > (_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172): > 1.25080):0.17461):6.94325); >>>>>> >>>> >>>> -- >>>> Shameer Khadar >>>> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group >>>> National Centre for Biological Sciences (TIFR) >>>> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India >>>> T - 91-080-23666001 EXT - 6251 >>>> W - http://www.ncbs.res.in >>>> >>> >>> >>> Lucia Peixoto >>> Department of Biology,SAS >>> University of Pennsylvania >>> >> >> >> -- >> Shameer Khadar >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group >> National Centre for Biological Sciences (TIFR) >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India >> T - 91-080-23666001 EXT - 6251 >> W - http://www.ncbs.res.in >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From outaleb at web.de Mon Oct 1 22:37:26 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 04:37:26 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? Message-ID: <4701AEE6.6070506@web.de> Hi every body, i have some AccNum in a file-> IPI67675 IPI98976. ... what i want is how can i look in the fasta file (db fasta) if there is some match if yes then copy the entire entry into a new fasta file. i tried with bioperl but cause i m noob:-(( i don t get it. thx all From ULNJUJERYDIX at spammotel.com Tue Oct 2 02:21:31 2007 From: ULNJUJERYDIX at spammotel.com (Kevin Lam) Date: Tue, 2 Oct 2007 14:21:31 +0800 Subject: [Bioperl-l] divide and blast blastunsplit blast subsequence Message-ID: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Hi! I am trying to annotate a 200kb sequence by doing blastx to find the protein seq location I need to split the sequence up so that I get the best hits for each region (the top blast hits will mask the smaller proteins if i do it as a whole sequence) if i were to do it manually i can set the subsequence in the web gui for ncbi's blast. this way, the blast hits coords are based on the whole 200kb. but I can't find this option in blast or a straightforward way to do it in bioperl. I found similar solutions like http://www.bio.davidson.edu/projects/DAB/DAB.html divide and blast (but I need to specify coords) there also this from the bioperl archives http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html but isn't there an easier way like i can specify blast subsequence 200-900 of fasta file and it will return the blastx hits in coords in terms of the whole 200kb? From n.haigh at sheffield.ac.uk Tue Oct 2 03:56:57 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 02 Oct 2007 08:56:57 +0100 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <4701AEE6.6070506@web.de> References: <4701AEE6.6070506@web.de> Message-ID: <4701F9C9.4050808@sheffield.ac.uk> outaleb Issame wrote: > Hi every body, > i have some AccNum in a file-> IPI67675 > IPI98976. > ... > > what i want is how can i look in the fasta file (db fasta) if there is > some match > if yes then copy the entire entry into a new fasta file. > i tried with bioperl but cause i m noob:-(( i don t get it. > thx all > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Can you state clearly, what is in the "AccNum" file exactly, some sample text from the actual file would be good. Is the FASTA file containing the sequences in raw FASTA format or has it been processed using somthing like formatdb from the BLAST software? A few more details will help people understand and in turn help you with a swift solution. Cheers Nath From outaleb at web.de Tue Oct 2 05:22:17 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 11:22:17 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <4701F9C9.4050808@sheffield.ac.uk> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> Message-ID: <47020DC9.8040401@web.de> hi, with this file i mean, i picked out this Accession Number from IPI-Human Dbase,they come from a fasta file, so they re under eachother like a i a table in separate file now. what i want is how how can i check it in the fasta File (so in the IPI-Human FAsta File), i they re really there; if yes please copy the entire entry of this Number (>....the sequence also)in new fasta file.so that i get at the end a new FASTA file with jus this IPI Accession Number. thx and hope was clearly. Nathan S. Haigh wrote: >outaleb Issame wrote: > > >>Hi every body, >>i have some AccNum in a file-> IPI67675 >> IPI98976. >> ... >> >>what i want is how can i look in the fasta file (db fasta) if there is >>some match >>if yes then copy the entire entry into a new fasta file. >>i tried with bioperl but cause i m noob:-(( i don t get it. >>thx all >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >Can you state clearly, what is in the "AccNum" file exactly, some sample >text from the actual file would be good. Is the FASTA file containing >the sequences in raw FASTA format or has it been processed using >somthing like formatdb from the BLAST software? > >A few more details will help people understand and in turn help you with >a swift solution. > >Cheers >Nath >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From n.haigh at sheffield.ac.uk Tue Oct 2 05:56:49 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 02 Oct 2007 10:56:49 +0100 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47020DC9.8040401@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> Message-ID: <470215E1.4080901@sheffield.ac.uk> outaleb Issame wrote: > hi, > with this file i mean, i picked out this Accession Number from > IPI-Human Dbase,they come from a fasta file, > so they re under eachother like a i a table in separate file now. > what i want is how how can i check it in the fasta File (so in the > IPI-Human FAsta File), i they re really there; > if yes please copy the entire entry of this Number (>....the sequence > also)in new fasta file.so that i get at the end a new > FASTA file with jus this IPI Accession Number. > thx and hope was clearly. Ok, first of all, I'd read the contents of your Accession numbers into a hash, something like the following (this could be written in a shorter form, but since you're a newbie I'll leave it in a longer form so you can follow easier). -- start script -- use strict; use Bio::SeqIO; # change the following three lines to point to the relevant paths # of your list of accessions file, your fasta file and your output # fasta file my $acc_file = "/path/to/your/file"; my $fasta_file_in = "/path/to/your/fasta/file"; my $fasta_file_out = "/path/to/your/fasta/output/file"; # Use a hash to keep a record of accessions we want to find my %hash_of_req_acc; # read all the required accessions from the file into the hash as keys open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; while () { my $line = $_; chomp $line; $hash_of_req_acc{$_} = 1; } close ACC_FILE; my $seqio_object_in = Bio::SeqIO->new( -file => $fasta_file_in, -format => 'fasta' ); my $seqio_object_out = Bio::SeqIO->new( -file => $fasta_file_out, -format => 'fasta' ); # loop through all the sequences in the fasta file while (my $seq_object = $seqio_object_in->next_seq) { # get the sequence accession for easy matching my $seq_acc = $seq_object->accession_number; # write the sequence object to the output fasta file if we have a matching accession $seqio_object_out->write_seq($seq_object) if exists $hash_of_req_acc{$seq_acc}; } -- end script -- I haven't tested this, but it should at least get you started. Also, the fasta description line in the output file may not be exactly as it was in the input fasta file - if this really matters, you may need to get back to us. Also, if the input fasta file is huge (many thousands of sequences) it may be wise to create an index of the fasta file in order to speed up retrieval. You may find this page helpful: http://www.bioperl.org/wiki/HOWTO:SeqIO Anyway, hope this helps to get you started. Nath From outaleb at web.de Tue Oct 2 06:50:32 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 12:50:32 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <470215E1.4080901@sheffield.ac.uk> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> Message-ID: <47022278.7010700@web.de> thx for the help, but i got a empty output file, i think its problem with matching the acc number, my fasta file look like: *>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein DDHHHU... >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein DDHHHU.. >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein MMMMM..* and my i Accnum File look like: *IPI00177321 IPI00453473 *i hopt it helps to understand.* *. Nathan S. Haigh wrote: >outaleb Issame wrote: > > >>hi, >>with this file i mean, i picked out this Accession Number from >>IPI-Human Dbase,they come from a fasta file, >>so they re under eachother like a i a table in separate file now. >>what i want is how how can i check it in the fasta File (so in the >>IPI-Human FAsta File), i they re really there; >>if yes please copy the entire entry of this Number (>....the sequence >>also)in new fasta file.so that i get at the end a new >>FASTA file with jus this IPI Accession Number. >>thx and hope was clearly. >> >> > >Ok, first of all, I'd read the contents of your Accession numbers into a >hash, something like the following (this could be written in a shorter >form, but since you're a newbie I'll leave it in a longer form so you >can follow easier). > >-- start script -- >use strict; >use Bio::SeqIO; > ># change the following three lines to point to the relevant paths ># of your list of accessions file, your fasta file and your output ># fasta file >my $acc_file = "/path/to/your/file"; >my $fasta_file_in = "/path/to/your/fasta/file"; >my $fasta_file_out = "/path/to/your/fasta/output/file"; > ># Use a hash to keep a record of accessions we want to find >my %hash_of_req_acc; > ># read all the required accessions from the file into the hash as keys >open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >while () { > my $line = $_; > chomp $line; > $hash_of_req_acc{$_} = 1; >} >close ACC_FILE; > >my $seqio_object_in = Bio::SeqIO->new( > -file => $fasta_file_in, > -format => 'fasta' >); >my $seqio_object_out = Bio::SeqIO->new( > -file => $fasta_file_out, > -format => 'fasta' >); > ># loop through all the sequences in the fasta file >while (my $seq_object = $seqio_object_in->next_seq) { > # get the sequence accession for easy matching > my $seq_acc = $seq_object->accession_number; > > # write the sequence object to the output fasta file if we have a >matching accession > $seqio_object_out->write_seq($seq_object) if exists >$hash_of_req_acc{$seq_acc}; >} >-- end script -- > >I haven't tested this, but it should at least get you started. Also, the >fasta description line in the output file may not be exactly as it was >in the input fasta file - if this really matters, you may need to get >back to us. Also, if the input fasta file is huge (many thousands of >sequences) it may be wise to create an index of the fasta file in order >to speed up retrieval. > >You may find this page helpful: >http://www.bioperl.org/wiki/HOWTO:SeqIO > >Anyway, hope this helps to get you started. >Nath > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Tue Oct 2 09:00:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Oct 2007 08:00:57 -0500 Subject: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: There is a script that comes with the bioperl core distribution, bp_split_seq.pl, which does this. Here's the CVS location: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ scripts/seq/?cvsroot=bioperl chris On Oct 2, 2007, at 1:21 AM, Kevin Lam wrote: > Hi! > I am trying to annotate a 200kb sequence by doing blastx to find > the protein > seq location > I need to split the sequence up so that I get the best hits for > each region > (the top blast hits will mask the smaller proteins if i do it as a > whole > sequence) > if i were to do it manually i can set the subsequence in the web > gui for > ncbi's blast. > this way, the blast hits coords are based on the whole 200kb. > > but I can't find this option in blast or a straightforward way to > do it in > bioperl. > > I found similar solutions like > http://www.bio.davidson.edu/projects/DAB/DAB.html > divide and blast (but I need to specify coords) > > there also this from the bioperl archives > http://bioinformatics.org/pipermail/bioclusters/2002-August/ > 000375.html > > but isn't there an easier way like i can specify blast subsequence > 200-900 > of fasta file and it will return the blastx hits in coords in terms > of the > whole 200kb? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rrc22 at cam.ac.uk Tue Oct 2 09:38:45 2007 From: rrc22 at cam.ac.uk (Roy Chaudhuri) Date: Tue, 02 Oct 2007 14:38:45 +0100 Subject: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: <470249E5.6050206@cam.ac.uk> > but isn't there an easier way like i can specify blast subsequence 200-900 > of fasta file and it will return the blastx hits in coords in terms of the > whole 200kb? Once you have split up your sequence (as Chris suggested), and run your BLAST, then you can add the hits to each subsequence as features. The subsequences can then be re-assembled using the cat method from Bio::SeqUtils, which will adjust the coordinates of the features appropriately. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From cjfields at uiuc.edu Tue Oct 2 11:03:29 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Oct 2007 10:03:29 -0500 Subject: [Bioperl-l] exonerate In-Reply-To: <29C4D729-6715-4C19-9872-3B1AF90EAFA3@tll.org.sg> References: <034FB11C-B4E9-4E4E-B213-D4AC6A397B1B@tll.org.sg> <29C4D729-6715-4C19-9872-3B1AF90EAFA3@tll.org.sg> Message-ID: <73B4E193-69D0-409D-9F89-20FB677F45C9@uiuc.edu> One option is to try running $run->cleanup() after you finish parsing, which gets rid of the tempfiles on each run. chris On Sep 30, 2007, at 8:53 PM, alan wrote: > Hi, > > >>> I am calling exonerate.pm within my script while attempting to >>> align cDNA to multiple genomic fragments. After processing about >>> 120+ genomic fragments my code crashes with the following error: >>> >>> ** ERROR **: Could not open [/tmp/tlInatbOED] : Too many open files >>> aborting... >>> MSG: Exonerate call (/usr/local/bin/exonerate /tmp/8X9jQuHUGF / >>> tmp/tlInatbOED > /tmp/EolF5qCNLZ/cIf0HfIRf5) crashed: 34304 >>> STACK Bio::Tools::Run::Alignment::Exonerate::_run /nfs1/alan/ >>> cvs_src/bioperl-run/Bio/Tools/Run/Alignment/Exonerate.pm:214 >>> STACK Bio::Tools::Run::Alignment::Exonerate::run /nfs1/alan/ >>> cvs_src/bioperl-run/Bio/Tools/Run/Alignment/Exonerate.pm:174 >>> >>> The code in Exonerate.pm closes the tmpfile at the end of the >>> routine yet I get the error message about "too many open files". >>> Any suggestions on how I should be closing these files? >>> >>> >>> Extract from my code that runs exonerate is listed below. >>> >>> foreach my $f(@files) { >>> next unless (-f "$dir/$f"); >>> my $q_in = Bio::SeqIO->new(-file=>$query, -format=>"Fasta"); >>> my $query_obj = $q_in->next_seq(); >>> my $target_in = Bio::SeqIO->new(-file=>"$dir/$f", - >>> format=>"Fasta"); >>> my $target_obj = $target_in->next_seq(); >>> my $run = Bio::Tools::Run::Alignment::Exonerate->new(); >>> my $exonerate_io = $run->run($query_obj, $target_obj); >>> >>> [code for parsing the data.......] >>> >>> $exonerate_io->close; #tried this line out of desperation but it >>> did not help :-) >>> } >>> >>> thanks >>> alan From outaleb at web.de Tue Oct 2 10:51:05 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 16:51:05 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47022278.7010700@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> Message-ID: <47025AD9.1090105@web.de> hi again, i think i can resolve this problem with the method : id_parser(); how can i do that? any suggestion .or experience?? ehx again outaleb Issame wrote: >thx for the help, but i got a empty output file, >i think its problem with matching the acc number, my fasta file look like: > >*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein >DDHHHU... > >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein >DDHHHU.. > >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein >MMMMM..* > >and my i Accnum File look like: >*IPI00177321 >IPI00453473 > >*i hopt it helps to understand.* >*. > > >Nathan S. Haigh wrote: > > > >>outaleb Issame wrote: >> >> >> >> >>>hi, >>>with this file i mean, i picked out this Accession Number from >>>IPI-Human Dbase,they come from a fasta file, >>>so they re under eachother like a i a table in separate file now. >>>what i want is how how can i check it in the fasta File (so in the >>>IPI-Human FAsta File), i they re really there; >>>if yes please copy the entire entry of this Number (>....the sequence >>>also)in new fasta file.so that i get at the end a new >>>FASTA file with jus this IPI Accession Number. >>>thx and hope was clearly. >>> >>> >>> >>> >>Ok, first of all, I'd read the contents of your Accession numbers into a >>hash, something like the following (this could be written in a shorter >>form, but since you're a newbie I'll leave it in a longer form so you >>can follow easier). >> >>-- start script -- >>use strict; >>use Bio::SeqIO; >> >># change the following three lines to point to the relevant paths >># of your list of accessions file, your fasta file and your output >># fasta file >>my $acc_file = "/path/to/your/file"; >>my $fasta_file_in = "/path/to/your/fasta/file"; >>my $fasta_file_out = "/path/to/your/fasta/output/file"; >> >># Use a hash to keep a record of accessions we want to find >>my %hash_of_req_acc; >> >># read all the required accessions from the file into the hash as keys >>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >>while () { >> my $line = $_; >> chomp $line; >> $hash_of_req_acc{$_} = 1; >>} >>close ACC_FILE; >> >>my $seqio_object_in = Bio::SeqIO->new( >> -file => $fasta_file_in, >> -format => 'fasta' >>); >>my $seqio_object_out = Bio::SeqIO->new( >> -file => $fasta_file_out, >> -format => 'fasta' >>); >> >># loop through all the sequences in the fasta file >>while (my $seq_object = $seqio_object_in->next_seq) { >> # get the sequence accession for easy matching >> my $seq_acc = $seq_object->accession_number; >> >> # write the sequence object to the output fasta file if we have a >>matching accession >> $seqio_object_out->write_seq($seq_object) if exists >>$hash_of_req_acc{$seq_acc}; >>} >>-- end script -- >> >>I haven't tested this, but it should at least get you started. Also, the >>fasta description line in the output file may not be exactly as it was >>in the input fasta file - if this really matters, you may need to get >>back to us. Also, if the input fasta file is huge (many thousands of >>sequences) it may be wise to create an index of the fasta file in order >>to speed up retrieval. >> >>You may find this page helpful: >>http://www.bioperl.org/wiki/HOWTO:SeqIO >> >>Anyway, hope this helps to get you started. >>Nath >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From rvos at interchange.ubc.ca Tue Oct 2 13:00:36 2007 From: rvos at interchange.ubc.ca (rvos at interchange.ubc.ca) Date: Tue, 02 Oct 2007 10:00:36 -0700 Subject: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? Message-ID: <405453b66e.3b66e40545@interchange.ubc.ca> An alternative is to explore the Bio::Phylo treedrawer: http://search.cpan.org/~rvosa/Bio-Phylo-0.17_RC6/lib/Bio/Phylo/Treedrawer.pm This is a separate install (in the interest of full disclosure: I'm the author). Rutger ----- Original Message ----- From: Jason Stajich Date: Monday, October 1, 2007 12:32 pm Subject: Re: [Bioperl-l] How to draw Phylogeny Tree using Bioperl ? > I'd definitely recommend Bio::Tree::Draw::Cladogram over svggraph > for prettier trees - you get postscript out but you can render this > > to png or jpg with unix tools. If there is a better stand alone > tree > drawing engine we're happy to try and integrate it into bioperl - > the > modules here are native Perl only and you can use the bioperl-run > modules that wrap DrawTree and DrawGram from EMBOSS to get other PS > > rendering output. > > Mesquite, TreeView or other tools are usually much better but not > always an option if you want to auto-render these images for a > website, etc. > > -jason > > > On Oct 1, 2007, at 11:48 AM, Lucia Peixoto wrote: > > > Yes, > > that's the issue about those commands, trees are not pretty at all > > that's why for a one tree only kind of thing I rather use ITOL > > other thing to try is the tree drawer of the Mesquite package > > glad I could help > > > > Lucia > > > > Quoting Shameer Khadar : > > > >> Dear Lucia, > >> > >> Thanks for the mail. Now I got it. I didnt used this TreeIO / > >> Tree::Draw > >> methods. Some how missed this excellent HOWTO : > >> http://www.bioperl.org/wiki/HOWTO:Trees. Thanks for that code as > > >> well. I > >> tried that and it worked very nicely. I have to work around to > >> beautify > >> the tree and I am just going to do that. > >> > >> Thanks & Cheers, > >> Shameer > >> > >>> OK > >>> > >>> you can use the implementations in Bio::TreeIO > >>> > >>> you can basically read the tree in newick format and out as an > >>> svg graph > >>> something like this: > >>> > >>> my $in = new Bio::TreeIO(-file => 'input', > >>> -format => 'newick'); > >>> my $out = new Bio::TreeIO(-file => '>mytree.svg', > >>> -format => 'svggraph'); > >>> while( my $tree = $in->next_tree ) { > >>> $out->write_tree($tree); > >>> } > >>> > >>> you can also use Bio::Tree::Draw > >>> > >>> hope that helps > >>> > >>> Lucia > >>> > >>> > >>> Quoting Shameer Khadar : > >>> > >>>> Hi, > >>>> > >>>> Thanks for your mail. I have to create these trees as a part > of a > >>>> webserver. i need to generate them dynamically using users input > >>>> sequence. > >>>> I think ITOL is not the stuff best suited for my purpose. > >>>> > >>>>> I think you'll have better luck using some of already available > >>>> programs > >>>>> to do > >>>>> that, you'll get better looking trees. If you just have one > >>>>> tree to > >>>> draw I > >>>>> recommend you use: > >>>>> http://itol.embl.de/ > >>>>> > >>>>> Lucia > >>>>> > >>>>> > >>>>> Quoting Shameer Khadar : > >>>>> > >>>>>> Dear All, > >>>>>> > >>>>>> Is it possible to draw a phylogeny tree file in PNG format > using>>>> Bioperl > >>>>>> ? > >>>>>> My input file are in phylip treefile format. Any Modules / > > >>>>>> codes in > >>>>>> Bio::Graphics / Phylogeny sections ? > >>>>>> > >>>>>> Input file : > >>>>>> > >>>>> > >>>> > >>> > >> > > ((((((_L_537_539:3.70000,_H_535_536:3.70000):4.97461, > > (((((((_E_499_500:2.75000,_E_250_251:2.75000): > > 0.55805,_L_252_254:3.30805):1.38576,_H_494_497:4.69381): > > 1.51514,_H_255_263:6.20895):0.83877, > > (_L_246_249:4.30000,_H_244_245:4.30000):2.74772): > > 0.92645,_H_520_534:7.97418):0.15279, > > (_H_502_512:6.95000,_H_273_282:6.95000):1.17697):0.54765): > > 1.10967,_L_264_272:9.78428):0.53441,_L_283_291:10.31869):3.59264, > > ((((_H_479_493:8.25300,((_H_448_462:7.25000,_L_409_411:7.25000): > > 0.50000,_L_445_447:7.75000):0.50300):1.08808, > > (((((_E_381_382:2.65000,_E_377_378:2.65000): > > 0.26434,_L_379_380:2.91434):1.77630,_L_373_376:4.69063): > > 1.70029,_L_436_444:6.39093):0.93320,_L_383_391:7.32413):2.01696): > > 0.94916,(((((_L_465_469:3.90000,_L_366_368:3.90000): > > 1.52226,_H_463_464:5.42226):0.94093, > > (_E_427_435:5.15000,_E_369_372:5.15000):1.21319): > > 1.64489,_L_336_343:8.00808):0.88402, > > (((_H_355_365:6.20000,_L_349_354:6.20000):0.91541, > > (_L_327_328:3.40000,_L_318_326:3.40000):3.71541):0.59082, > > (((((_E_470_474:3.85000,_E_344_348: > >>>> 3. > >>>>>> > >>>>> > >>>> > >>> > >> > > 85000):0.89054,_L_475_478:4.74054):1.20107, > > (_E_329_335:3.85000,_E_315_317:3.85000):2.09161): > > 0.71112,_L_513_519:6.65273):0.67204, > > ((_L_296_304:5.00000,_H_292_295:5.00000): > > 0.71200,_L_305_314:5.71200):1.61276):0.38147):1.18587):1.39814): > > 0.37326,((((_L_397_398:3.55000,_E_394_396:3.55000): > > 1.36938,_L_392_393:4.91938):1.43993,_L_422_426:6.35931):1.00790, > > > (_E_412_421:5.95000,_E_399_408:5.95000):1.41721):3.29629):3.24784): > > 4.31679,((((((_L_206_210:3.80000,_H_203_205:3.80000): > > 1.75687,_L__49__50:5.55687):1.29579,_H_188_202:6.85266): > > 1.08193,_H_229_243:7.93459):0.18730, > > (_L_222_228:5.55000,_H_211_221:5.55000):2.57189):3.16298, > > ((((_H_159_171:7.00000,_L_156_158:7.00000): > > 0.07448,_L_120_122:7.07448):1.59389, > > ((((_L__90__91:2.65000,_E__88__89:2.65000): > > 0.18425,_E__92__93:2.83425):1.94636,_L__84__87:4.78061): > > 1.74719,_L_147_155:6.52780):2.14057):2.44189, > > ((((((((_E_179_182:3.50000,_E__80__83:3.50000):2.15544, > > (_L_172_178:3.95000,_L__77__79:3.95000):1.70544): > > 0.42200,_E_138_146:6.07744):0.46209,_E__51__ > >>>> 5 > >>>>>> > >>>>> > >>>> > >>> > >> > > 8:6.53954):0.74619,_L_183_187:7.28573):0.73805, > > (_E_123_131:5.70000,_E_110_119:5.70000):2.32378):0.58197, > > ((((_L_108_109:4.30000,_E_104_107:4.30000): > > 1.34305,_H__66__76:5.64305):0.81535,_L_132_137:6.45840):1.22044, > > > (_L__94_103:6.55000,_L__59__65:6.55000):1.12884):0.92691):1.25371, > > ((_L__38__39:3.40000,_L__29__37:3.40000):3.64775, > > (((((_H___3___6:3.30000,_L___1___2:3.30000): > > 0.79885,_L__21__25:4.09885):0.80972,_E__26__28:4.90856): > > 0.88488,_E__40__48:5.79344):0.60814, > > > (_L__12__20:5.25000,_H___8__11:5.25000):1.15158):0.64616):2.81172): > > 1.25080):0.17461):6.94325); > >>>>>> > >>>> > >>>> -- > >>>> Shameer Khadar > >>>> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > >>>> National Centre for Biological Sciences (TIFR) > >>>> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > >>>> T - 91-080-23666001 EXT - 6251 > >>>> W - http://www.ncbs.res.in > >>>> > >>> > >>> > >>> Lucia Peixoto > >>> Department of Biology,SAS > >>> University of Pennsylvania > >>> > >> > >> > >> -- > >> Shameer Khadar > >> Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > >> National Centre for Biological Sciences (TIFR) > >> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > >> T - 91-080-23666001 EXT - 6251 > >> W - http://www.ncbs.res.in > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Tue Oct 2 17:34:20 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Oct 2007 10:34:20 +1300 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47025AD9.1090105@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk><47022278.7010700@web.de> <47025AD9.1090105@web.de> Message-ID: I know this is the Bioperl list but how about just doing it with grep? grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of outaleb Issame > Sent: Wednesday, 3 October 2007 3:51 a.m. > To: outaleb Issame > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta? > > hi again, > i think i can resolve this problem with the method : id_parser(); > how can i do that? > any suggestion .or experience?? > ehx again > > > > outaleb Issame wrote: > > >thx for the help, but i got a empty output file, > >i think its problem with matching the acc number, my fasta file look like: > > > >*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > protein > >DDHHHU... > > >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > protein > >DDHHHU.. > > >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > protein > >MMMMM..* > > > >and my i Accnum File look like: > >*IPI00177321 > >IPI00453473 > > > >*i hopt it helps to understand.* > >*. > > > > > >Nathan S. Haigh wrote: > > > > > > > >>outaleb Issame wrote: > >> > >> > >> > >> > >>>hi, > >>>with this file i mean, i picked out this Accession Number from > >>>IPI-Human Dbase,they come from a fasta file, > >>>so they re under eachother like a i a table in separate file now. > >>>what i want is how how can i check it in the fasta File (so in the > >>>IPI-Human FAsta File), i they re really there; > >>>if yes please copy the entire entry of this Number (>....the sequence > >>>also)in new fasta file.so that i get at the end a new > >>>FASTA file with jus this IPI Accession Number. > >>>thx and hope was clearly. > >>> > >>> > >>> > >>> > >>Ok, first of all, I'd read the contents of your Accession numbers into a > >>hash, something like the following (this could be written in a shorter > >>form, but since you're a newbie I'll leave it in a longer form so you > >>can follow easier). > >> > >>-- start script -- > >>use strict; > >>use Bio::SeqIO; > >> > >># change the following three lines to point to the relevant paths > >># of your list of accessions file, your fasta file and your output > >># fasta file > >>my $acc_file = "/path/to/your/file"; > >>my $fasta_file_in = "/path/to/your/fasta/file"; > >>my $fasta_file_out = "/path/to/your/fasta/output/file"; > >> > >># Use a hash to keep a record of accessions we want to find > >>my %hash_of_req_acc; > >> > >># read all the required accessions from the file into the hash as keys > >>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; > >>while () { > >> my $line = $_; > >> chomp $line; > >> $hash_of_req_acc{$_} = 1; > >>} > >>close ACC_FILE; > >> > >>my $seqio_object_in = Bio::SeqIO->new( > >> -file => $fasta_file_in, > >> -format => 'fasta' > >>); > >>my $seqio_object_out = Bio::SeqIO->new( > >> -file => $fasta_file_out, > >> -format => 'fasta' > >>); > >> > >># loop through all the sequences in the fasta file > >>while (my $seq_object = $seqio_object_in->next_seq) { > >> # get the sequence accession for easy matching > >> my $seq_acc = $seq_object->accession_number; > >> > >> # write the sequence object to the output fasta file if we have a > >>matching accession > >> $seqio_object_out->write_seq($seq_object) if exists > >>$hash_of_req_acc{$seq_acc}; > >>} > >>-- end script -- > >> > >>I haven't tested this, but it should at least get you started. Also, the > >>fasta description line in the output file may not be exactly as it was > >>in the input fasta file - if this really matters, you may need to get > >>back to us. Also, if the input fasta file is huge (many thousands of > >>sequences) it may be wise to create an index of the fasta file in order > >>to speed up retrieval. > >> > >>You may find this page helpful: > >>http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >>Anyway, hope this helps to get you started. > >>Nath > >> > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> > >> > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From thiago.venancio at gmail.com Tue Oct 2 17:41:06 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Tue, 2 Oct 2007 18:41:06 -0300 Subject: [Bioperl-l] frac_* methods Message-ID: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> Hi all, This topic was discussed before, but I would like to put it on the list again, maybe someone has an update. The methods frac_identical, frac_conserved, frac_aligned_query and frac_aligned_hit can also be used in the hit context, after HSP tilling. In my point of view, it is better to use it just in HSPs individually, because there are some rare/strange kinds of alignments. However, we frequently need to get one measure of the whole alignment. Any of the BioPerl masters has an update on this topic ? What is the best current usage ? Best. Thiago -- "Innovation distinguishes between a leader and a follower." Steve Jobs ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From outaleb at web.de Tue Oct 2 17:47:07 2007 From: outaleb at web.de (outaleb Issame) Date: Tue, 02 Oct 2007 23:47:07 +0200 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk><47022278.7010700@web.de> <47025AD9.1090105@web.de> Message-ID: <4702BC5B.7040407@web.de> thx for this, but i want just create new fasta file with my accNumbers which i search in the FASTA file(localdbase). so --> just search this Numbers in the FASTA file, if yes then copy the Header and Sequence to other new fasta file . i m sitting in this 2 days now; i dont think it s difficult but howww????? i get crazy guys. common some expert in this area?? Smithies, Russell wrote: >I know this is the Bioperl list but how about just doing it with grep? > > grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta > > > > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >> >> >[mailto:bioperl-l-bounces at lists.open- > > >>bio.org] On Behalf Of outaleb Issame >>Sent: Wednesday, 3 October 2007 3:51 a.m. >>To: outaleb Issame >>Cc: bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta? >> >>hi again, >>i think i can resolve this problem with the method : id_parser(); >>how can i do that? >>any suggestion .or experience?? >>ehx again >> >> >> >>outaleb Issame wrote: >> >> >> >>>thx for the help, but i got a empty output file, >>>i think its problem with matching the acc number, my fasta file look >>> >>> >like: > > >>>*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>> >>> >>protein >> >> >>>DDHHHU... >>> >>> >>>>IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>>> >>>> >>protein >> >> >>>DDHHHU.. >>> >>> >>>>IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>>> >>>> >>protein >> >> >>>MMMMM..* >>> >>>and my i Accnum File look like: >>>*IPI00177321 >>>IPI00453473 >>> >>>*i hopt it helps to understand.* >>>*. >>> >>> >>>Nathan S. Haigh wrote: >>> >>> >>> >>> >>> >>>>outaleb Issame wrote: >>>> >>>> >>>> >>>> >>>> >>>> >>>>>hi, >>>>>with this file i mean, i picked out this Accession Number from >>>>>IPI-Human Dbase,they come from a fasta file, >>>>>so they re under eachother like a i a table in separate file now. >>>>>what i want is how how can i check it in the fasta File (so in the >>>>>IPI-Human FAsta File), i they re really there; >>>>>if yes please copy the entire entry of this Number (>....the >>>>> >>>>> >sequence > > >>>>>also)in new fasta file.so that i get at the end a new >>>>>FASTA file with jus this IPI Accession Number. >>>>>thx and hope was clearly. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>Ok, first of all, I'd read the contents of your Accession numbers >>>> >>>> >into a > > >>>>hash, something like the following (this could be written in a >>>> >>>> >shorter > > >>>>form, but since you're a newbie I'll leave it in a longer form so >>>> >>>> >you > > >>>>can follow easier). >>>> >>>>-- start script -- >>>>use strict; >>>>use Bio::SeqIO; >>>> >>>># change the following three lines to point to the relevant paths >>>># of your list of accessions file, your fasta file and your output >>>># fasta file >>>>my $acc_file = "/path/to/your/file"; >>>>my $fasta_file_in = "/path/to/your/fasta/file"; >>>>my $fasta_file_out = "/path/to/your/fasta/output/file"; >>>> >>>># Use a hash to keep a record of accessions we want to find >>>>my %hash_of_req_acc; >>>> >>>># read all the required accessions from the file into the hash as >>>> >>>> >keys > > >>>>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >>>>while () { >>>>my $line = $_; >>>>chomp $line; >>>>$hash_of_req_acc{$_} = 1; >>>>} >>>>close ACC_FILE; >>>> >>>>my $seqio_object_in = Bio::SeqIO->new( >>>>-file => $fasta_file_in, >>>>-format => 'fasta' >>>>); >>>>my $seqio_object_out = Bio::SeqIO->new( >>>>-file => $fasta_file_out, >>>>-format => 'fasta' >>>>); >>>> >>>># loop through all the sequences in the fasta file >>>>while (my $seq_object = $seqio_object_in->next_seq) { >>>># get the sequence accession for easy matching >>>>my $seq_acc = $seq_object->accession_number; >>>> >>>># write the sequence object to the output fasta file if we have a >>>>matching accession >>>>$seqio_object_out->write_seq($seq_object) if exists >>>>$hash_of_req_acc{$seq_acc}; >>>>} >>>>-- end script -- >>>> >>>>I haven't tested this, but it should at least get you started. Also, >>>> >>>> >the > > >>>>fasta description line in the output file may not be exactly as it >>>> >>>> >was > > >>>>in the input fasta file - if this really matters, you may need to >>>> >>>> >get > > >>>>back to us. Also, if the input fasta file is huge (many thousands of >>>>sequences) it may be wise to create an index of the fasta file in >>>> >>>> >order > > >>>>to speed up retrieval. >>>> >>>>You may find this page helpful: >>>>http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>>Anyway, hope this helps to get you started. >>>>Nath >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >======================================================================= >Attention: The information contained in this message and/or attachments >from AgResearch Limited is intended only for the persons or entities >to which it is addressed and may contain confidential and/or privileged >material. Any review, retransmission, dissemination or other use of, or >taking of any action in reliance upon, this information by persons or >entities other than the intended recipients is prohibited by AgResearch >Limited. If you have received this message in error, please notify the >sender immediately. >======================================================================= > > > From jason at bioperl.org Tue Oct 2 18:22:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 2 Oct 2007 15:22:59 -0700 Subject: [Bioperl-l] frac_* methods In-Reply-To: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> References: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> Message-ID: <3DC00A97-EF7E-41B4-854F-B088715AB901@bioperl.org> I think my answer before was something to the tune of: Use an alignment algorithm that finds a single best alignment like FASTA or Smith-Waterman (SW) if what you want is a single number that represents the alignment. BLAST is great for fast searching but FASTA or SW/SSEARCH are going to be better at creating an alignment. Consider the -postsw option in WUBLAST as well as it will realign the HSPs with SW. I personally never use the frac alignment summary stats for the Hit object for this reason unless I know I am going to have a single HSP. -jason On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > Hi all, > > This topic was discussed before, but I would like to put it on the > list > again, maybe someone has an update. > > The methods frac_identical, frac_conserved, frac_aligned_query and > frac_aligned_hit can also be used in the hit context, after HSP > tilling. In > my point of view, it is better to use it just in HSPs individually, > because > there are some rare/strange kinds of alignments. However, we > frequently need > to get one measure of the whole alignment. > > Any of the BioPerl masters has an update on this topic ? What is > the best > current usage ? > > Best. > > Thiago > > -- > "Innovation distinguishes between a leader and a follower." > Steve Jobs > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Tue Oct 2 18:32:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Oct 2007 17:32:30 -0500 Subject: [Bioperl-l] frac_* methods In-Reply-To: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> References: <44255ea80710021441v6ee2c5e0x57e91c66c0e07c3c@mail.gmail.com> Message-ID: I think their use is based on what you are trying to accomplish. For instance I am currently running a lot of small BLASTN queries (limiting by normalized bit score), so I tend to look at the HSP data more. However, in other circumstances I might want the overall frac_identical for all HSPs ($hit->frac_identical). YMMV. chris On Oct 2, 2007, at 4:41 PM, Thiago Venancio wrote: > Hi all, > > This topic was discussed before, but I would like to put it on the > list > again, maybe someone has an update. > > The methods frac_identical, frac_conserved, frac_aligned_query and > frac_aligned_hit can also be used in the hit context, after HSP > tilling. In > my point of view, it is better to use it just in HSPs individually, > because > there are some rare/strange kinds of alignments. However, we > frequently need > to get one measure of the whole alignment. > > Any of the BioPerl masters has an update on this topic ? What is > the best > current usage ? > > Best. > > Thiago From razi.khaja at gmail.com Tue Oct 2 19:46:12 2007 From: razi.khaja at gmail.com (Razi Khaja) Date: Tue, 2 Oct 2007 19:46:12 -0400 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <4702BC5B.7040407@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> Message-ID: <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Here is the easiest non-bioperl solution using executables provided with ncbi's blast: (1) format your multifasta file into a blast database > /usr/local/ncbi/blast-2.2.16/bin/formatdb -i yourmultifastafile -t yourblastdb (2) extract sequences from the newly created blast database with a file containing a list of accession numbers (one on each line) > /usr/local/ncib/blast-2.2.16/bin/fastacmd -d yourblastdb -i inputfilewithaccessionnumbers -o outputfile Your outputfile should be a multifasta file of your list of accession numbers blast executables are available from http://www.ncbi.nlm.nih.gov/blast/download.shtml Hope that helps. Razi Khaja On 10/2/07, outaleb Issame wrote: > thx for this, but i want just create new fasta file with my accNumbers > which i search in the FASTA file(localdbase). > so --> just search this Numbers in the FASTA file, if yes then copy the > Header and Sequence to other new fasta file . > i m sitting in this 2 days now; i dont think it s difficult but howww????? > i get crazy guys. > common some expert in this area?? > > > > Smithies, Russell wrote: > > >I know this is the Bioperl list but how about just doing it with grep? > > > > grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta > > > > > > > > > > > >>-----Original Message----- > >>From: bioperl-l-bounces at lists.open-bio.org > >> > >> > >[mailto:bioperl-l-bounces at lists.open- > > > > > >>bio.org] On Behalf Of outaleb Issame > >>Sent: Wednesday, 3 October 2007 3:51 a.m. > >>To: outaleb Issame > >>Cc: bioperl-l at lists.open-bio.org > >>Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta? > >> > >>hi again, > >>i think i can resolve this problem with the method : id_parser(); > >>how can i do that? > >>any suggestion .or experience?? > >>ehx again > >> > >> > >> > >>outaleb Issame wrote: > >> > >> > >> > >>>thx for the help, but i got a empty output file, > >>>i think its problem with matching the acc number, my fasta file look > >>> > >>> > >like: > > > > > >>>*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > >>> > >>> > >>protein > >> > >> > >>>DDHHHU... > >>> > >>> > >>>>IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > >>>> > >>>> > >>protein > >> > >> > >>>DDHHHU.. > >>> > >>> > >>>>IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 > >>>> > >>>> > >>protein > >> > >> > >>>MMMMM..* > >>> > >>>and my i Accnum File look like: > >>>*IPI00177321 > >>>IPI00453473 > >>> > >>>*i hopt it helps to understand.* > >>>*. > >>> > >>> > >>>Nathan S. Haigh wrote: > >>> > >>> > >>> > >>> > >>> > >>>>outaleb Issame wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>hi, > >>>>>with this file i mean, i picked out this Accession Number from > >>>>>IPI-Human Dbase,they come from a fasta file, > >>>>>so they re under eachother like a i a table in separate file now. > >>>>>what i want is how how can i check it in the fasta File (so in the > >>>>>IPI-Human FAsta File), i they re really there; > >>>>>if yes please copy the entire entry of this Number (>....the > >>>>> > >>>>> > >sequence > > > > > >>>>>also)in new fasta file.so that i get at the end a new > >>>>>FASTA file with jus this IPI Accession Number. > >>>>>thx and hope was clearly. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>Ok, first of all, I'd read the contents of your Accession numbers > >>>> > >>>> > >into a > > > > > >>>>hash, something like the following (this could be written in a > >>>> > >>>> > >shorter > > > > > >>>>form, but since you're a newbie I'll leave it in a longer form so > >>>> > >>>> > >you > > > > > >>>>can follow easier). > >>>> > >>>>-- start script -- > >>>>use strict; > >>>>use Bio::SeqIO; > >>>> > >>>># change the following three lines to point to the relevant paths > >>>># of your list of accessions file, your fasta file and your output > >>>># fasta file > >>>>my $acc_file = "/path/to/your/file"; > >>>>my $fasta_file_in = "/path/to/your/fasta/file"; > >>>>my $fasta_file_out = "/path/to/your/fasta/output/file"; > >>>> > >>>># Use a hash to keep a record of accessions we want to find > >>>>my %hash_of_req_acc; > >>>> > >>>># read all the required accessions from the file into the hash as > >>>> > >>>> > >keys > > > > > >>>>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; > >>>>while () { > >>>>my $line = $_; > >>>>chomp $line; > >>>>$hash_of_req_acc{$_} = 1; > >>>>} > >>>>close ACC_FILE; > >>>> > >>>>my $seqio_object_in = Bio::SeqIO->new( > >>>>-file => $fasta_file_in, > >>>>-format => 'fasta' > >>>>); > >>>>my $seqio_object_out = Bio::SeqIO->new( > >>>>-file => $fasta_file_out, > >>>>-format => 'fasta' > >>>>); > >>>> > >>>># loop through all the sequences in the fasta file > >>>>while (my $seq_object = $seqio_object_in->next_seq) { > >>>># get the sequence accession for easy matching > >>>>my $seq_acc = $seq_object->accession_number; > >>>> > >>>># write the sequence object to the output fasta file if we have a > >>>>matching accession > >>>>$seqio_object_out->write_seq($seq_object) if exists > >>>>$hash_of_req_acc{$seq_acc}; > >>>>} > >>>>-- end script -- > >>>> > >>>>I haven't tested this, but it should at least get you started. Also, > >>>> > >>>> > >the > > > > > >>>>fasta description line in the output file may not be exactly as it > >>>> > >>>> > >was > > > > > >>>>in the input fasta file - if this really matters, you may need to > >>>> > >>>> > >get > > > > > >>>>back to us. Also, if the input fasta file is huge (many thousands of > >>>>sequences) it may be wise to create an index of the fasta file in > >>>> > >>>> > >order > > > > > >>>>to speed up retrieval. > >>>> > >>>>You may find this page helpful: > >>>>http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>>Anyway, hope this helps to get you started. > >>>>Nath > >>>> > >>>> > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l at lists.open-bio.org > >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l at lists.open-bio.org > >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > >>> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >======================================================================= > >Attention: The information contained in this message and/or attachments > >from AgResearch Limited is intended only for the persons or entities > >to which it is addressed and may contain confidential and/or privileged > >material. Any review, retransmission, dissemination or other use of, or > >taking of any action in reliance upon, this information by persons or > >entities other than the intended recipients is prohibited by AgResearch > >Limited. If you have received this message in error, please notify the > >sender immediately. > >======================================================================= > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Tue Oct 2 20:50:37 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 2 Oct 2007 17:50:37 -0700 Subject: [Bioperl-l] need help ??parse AcNum from fasta? In-Reply-To: <47025AD9.1090105@web.de> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> Message-ID: http://bioperl.open-bio.org/wiki/ FAQ#How_do_I_use_Bio::Index::Fasta_and_index_on_different_ids.3F On Oct 2, 2007, at 7:51 AM, outaleb Issame wrote: > hi again, > i think i can resolve this problem with the method : id_parser(); > how can i do that? > any suggestion .or experience?? > ehx again > > > > outaleb Issame wrote: > >> thx for the help, but i got a empty output file, >> i think its problem with matching the acc number, my fasta file >> look like: >> >> *>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to >> NOD3 protein >> DDHHHU... >>> IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>> protein >> DDHHHU.. >>> IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 >>> protein >> MMMMM..* >> >> and my i Accnum File look like: >> *IPI00177321 >> IPI00453473 >> >> *i hopt it helps to understand.* >> *. >> >> >> Nathan S. Haigh wrote: >> >> >> >>> outaleb Issame wrote: >>> >>> >>> >>> >>>> hi, >>>> with this file i mean, i picked out this Accession Number from >>>> IPI-Human Dbase,they come from a fasta file, >>>> so they re under eachother like a i a table in separate file now. >>>> what i want is how how can i check it in the fasta File (so in the >>>> IPI-Human FAsta File), i they re really there; >>>> if yes please copy the entire entry of this Number (>....the >>>> sequence >>>> also)in new fasta file.so that i get at the end a new >>>> FASTA file with jus this IPI Accession Number. >>>> thx and hope was clearly. >>>> >>>> >>>> >>>> >>> Ok, first of all, I'd read the contents of your Accession numbers >>> into a >>> hash, something like the following (this could be written in a >>> shorter >>> form, but since you're a newbie I'll leave it in a longer form so >>> you >>> can follow easier). >>> >>> -- start script -- >>> use strict; >>> use Bio::SeqIO; >>> >>> # change the following three lines to point to the relevant paths >>> # of your list of accessions file, your fasta file and your output >>> # fasta file >>> my $acc_file = "/path/to/your/file"; >>> my $fasta_file_in = "/path/to/your/fasta/file"; >>> my $fasta_file_out = "/path/to/your/fasta/output/file"; >>> >>> # Use a hash to keep a record of accessions we want to find >>> my %hash_of_req_acc; >>> >>> # read all the required accessions from the file into the hash as >>> keys >>> open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n"; >>> while () { >>> my $line = $_; >>> chomp $line; >>> $hash_of_req_acc{$_} = 1; >>> } >>> close ACC_FILE; >>> >>> my $seqio_object_in = Bio::SeqIO->new( >>> -file => $fasta_file_in, >>> -format => 'fasta' >>> ); >>> my $seqio_object_out = Bio::SeqIO->new( >>> -file => $fasta_file_out, >>> -format => 'fasta' >>> ); >>> >>> # loop through all the sequences in the fasta file >>> while (my $seq_object = $seqio_object_in->next_seq) { >>> # get the sequence accession for easy matching >>> my $seq_acc = $seq_object->accession_number; >>> >>> # write the sequence object to the output fasta file if we have a >>> matching accession >>> $seqio_object_out->write_seq($seq_object) if exists >>> $hash_of_req_acc{$seq_acc}; >>> } >>> -- end script -- >>> >>> I haven't tested this, but it should at least get you started. >>> Also, the >>> fasta description line in the output file may not be exactly as >>> it was >>> in the input fasta file - if this really matters, you may need to >>> get >>> back to us. Also, if the input fasta file is huge (many thousands of >>> sequences) it may be wise to create an index of the fasta file in >>> order >>> to speed up retrieval. >>> >>> You may find this page helpful: >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> Anyway, hope this helps to get you started. >>> Nath >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Tue Oct 2 21:05:25 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Oct 2007 14:05:25 +1300 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> References: <4701AEE6.6070506@web.de> <4701F9C9.4050808@sheffield.ac.uk><47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk><47022278.7010700@web.de> <47025AD9.1090105@web.de><4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Message-ID: Hi all, I'm using a modified version of Lincoln's tutorial (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) and I'm colouring the HSPs by setting the -bgcolor by score with a sub to give a similar image to that from NCBI but for some reason, my colours are coming out wrong (see attached example) They seem to be off by one but I can't see why. Any ideas? I can't be certain but I think it's only started doing this since our BLAST upgrade to 2.2.17 a few weeks ago. Here's the colouring code: ------------------------------------------------------------------------ ------- my $track = $panel->add_track( -glyph => 'segments', -label => 1, -connector => 'dashed', -bgcolor => sub { my $feature = shift; my $score = $feature->score; return 'red' if $score >= 200; return 'fuchsia' if $score >= 80; return 'lime' if $score >= 50; return 'blue' if $score >= 40; return 'black'; }, -font2color => 'gray', -sort_order => 'high_score', -description => sub { my $feature = shift; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my $score = $feature->score; "$description, score=$score"; }, ); ------------------------------------------------------------------------ --------- Thanx, Russell Smithies ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= -------------- next part -------------- A non-text attachment was scrubbed... Name: example.png Type: image/png Size: 18507 bytes Desc: example.png Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071003/72371841/attachment.png From aaron.j.mackey at gsk.com Tue Oct 2 21:40:14 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 2 Oct 2007 21:40:14 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: <3DC00A97-EF7E-41B4-854F-B088715AB901@bioperl.org> Message-ID: Let me second Jason's comment that while BLAST is a great search program, it is not a very good alignment algorithm. In this day and age with so many good pairwise alignment algorithms out there (customized for the context in which the alignment is performed), BLAST-based alignments should frankly be ignored. See: exonerate, pairagon, etc. Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with the FASTA package) is now vector-parallelized on most i386 architectures, it is only about 10 times slower than BLAST for complete database searches (with superior sensitivity/specificity); add PVM or MPI-based CPU parallelization on top of that, and there's almost no reason to even run BLAST anymore ... -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > I think my answer before was something to the tune of: > > Use an alignment algorithm that finds a single best alignment like > FASTA or Smith-Waterman (SW) if what you want is a single number that > represents the alignment. BLAST is great for fast searching but > FASTA or SW/SSEARCH are going to be better at creating an alignment. > Consider the -postsw option in WUBLAST as well as it will realign the > HSPs with SW. > > I personally never use the frac alignment summary stats for the Hit > object for this reason unless I know I am going to have a single HSP. > > -jason > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > Hi all, > > > > This topic was discussed before, but I would like to put it on the > > list > > again, maybe someone has an update. > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > frac_aligned_hit can also be used in the hit context, after HSP > > tilling. In > > my point of view, it is better to use it just in HSPs individually, > > because > > there are some rare/strange kinds of alignments. However, we > > frequently need > > to get one measure of the whole alignment. > > > > Any of the BioPerl masters has an update on this topic ? What is > > the best > > current usage ? > > > > Best. > > > > Thiago > > > > -- > > "Innovation distinguishes between a leader and a follower." > > Steve Jobs > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cuiw at ncbi.nlm.nih.gov Wed Oct 3 10:50:47 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 3 Oct 2007 10:50:47 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: References: <3DC00A97-EF7E-41B4-854F-B088715AB901@bioperl.org> Message-ID: <18C407FD4FFB424292D769FBD68C198701B18C35@NIHCESMLBX8.nih.gov> I agree that BLAST is not a very good alignment algorithm but believe there are plenty of reasons to run BLAST, especially when placing a contig /BAC/PAC to a genome. In those cases, fully implementation of SW requires an unpractical matrix of n X m. Currently we are developing an algorithm which will run global alignment after BLAST. Hopefully a Perl wrapper will become available next year. Wenwu Cui, PhD -----Original Message----- From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] Sent: Tuesday, October 02, 2007 9:40 PM To: Jason Stajich Cc: bioperl-l list; Thiago Venancio Subject: Re: [Bioperl-l] frac_* methods Let me second Jason's comment that while BLAST is a great search program, it is not a very good alignment algorithm. In this day and age with so many good pairwise alignment algorithms out there (customized for the context in which the alignment is performed), BLAST-based alignments should frankly be ignored. See: exonerate, pairagon, etc. Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with the FASTA package) is now vector-parallelized on most i386 architectures, it is only about 10 times slower than BLAST for complete database searches (with superior sensitivity/specificity); add PVM or MPI-based CPU parallelization on top of that, and there's almost no reason to even run BLAST anymore ... -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > I think my answer before was something to the tune of: > > Use an alignment algorithm that finds a single best alignment like > FASTA or Smith-Waterman (SW) if what you want is a single number that > represents the alignment. BLAST is great for fast searching but > FASTA or SW/SSEARCH are going to be better at creating an alignment. > Consider the -postsw option in WUBLAST as well as it will realign the > HSPs with SW. > > I personally never use the frac alignment summary stats for the Hit > object for this reason unless I know I am going to have a single HSP. > > -jason > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > Hi all, > > > > This topic was discussed before, but I would like to put it on the > > list > > again, maybe someone has an update. > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > frac_aligned_hit can also be used in the hit context, after HSP > > tilling. In > > my point of view, it is better to use it just in HSPs individually, > > because > > there are some rare/strange kinds of alignments. However, we > > frequently need > > to get one measure of the whole alignment. > > > > Any of the BioPerl masters has an update on this topic ? What is > > the best > > current usage ? > > > > Best. > > > > Thiago > > > > -- > > "Innovation distinguishes between a leader and a follower." > > Steve Jobs > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From aaron.j.mackey at gsk.com Wed Oct 3 11:53:12 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Wed, 3 Oct 2007 11:53:12 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: <18C407FD4FFB424292D769FBD68C198701B18C35@NIHCESMLBX8.nih.gov> Message-ID: I think Wenwu makes a nice distinction here between alignment and placement. BLAST is great at finding things and (thus) placing things. String matching has a long and rich history in computer science, and we tend to confuse the terms "alignment" with "matching". The "align a BAC/PAC to a genome" problem is one of string matching (with allowance for errors due to sequencing artifacts and possible SNPs); if there were no errors, we wouldn't use BLAST at all (and, in fact, I personally think programs such as MUMMER, or the various genome assembly tiling algorithms, are better for this particular problem). The problem of pairwise alignment can also be called matching, but the distinction (at least to me) is that the "errors" are true evolutionary mutations, and are expected to occur naturally (i.e. are not an artifact of the experiment that in an optimal world would not occur). BLAST is good at finding matches whose "errors" fit scoring-matrix-based evolutionary models, but it isn't very good at teasing out the actual evolutionary events that lead to those "errors" (this is not really a criticism of BLAST - it's job is not to generate evolutionarily-accurate, and -complete alignments, but to identify evolutionarily-conserved regions having statistical significance) Please don't get me wrong, I think BLAST is an invaluable tool that fully deserves its top-most place in the bioinformatics hall of fame. But I also don't believe that bioinformatics begins and ends with running a BLAST search and poring over the report details. -Aaron "Cui, Wenwu (NIH/NLM/NCBI) [C]" wrote on 10/03/2007 10:50:47 AM: > I agree that BLAST is not a very good alignment algorithm but believe > there are plenty of reasons to run BLAST, especially when placing a > contig /BAC/PAC to a genome. In those cases, fully implementation of SW > requires an unpractical matrix of n X m. > > Currently we are developing an algorithm which will run global alignment > after BLAST. Hopefully a Perl wrapper will become available next year. > > > Wenwu Cui, PhD > > -----Original Message----- > From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] > Sent: Tuesday, October 02, 2007 9:40 PM > To: Jason Stajich > Cc: bioperl-l list; Thiago Venancio > Subject: Re: [Bioperl-l] frac_* methods > > Let me second Jason's comment that while BLAST is a great search > program, > it is not a very good alignment algorithm. In this day and age with so > many good pairwise alignment algorithms out there (customized for the > context in which the alignment is performed), BLAST-based alignments > should frankly be ignored. See: exonerate, pairagon, etc. > > Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with > the > FASTA package) is now vector-parallelized on most i386 architectures, it > > is only about 10 times slower than BLAST for complete database searches > (with superior sensitivity/specificity); add PVM or MPI-based CPU > parallelization on top of that, and there's almost no reason to even run > > BLAST anymore ... > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > > > I think my answer before was something to the tune of: > > > > Use an alignment algorithm that finds a single best alignment like > > FASTA or Smith-Waterman (SW) if what you want is a single number that > > represents the alignment. BLAST is great for fast searching but > > FASTA or SW/SSEARCH are going to be better at creating an alignment. > > Consider the -postsw option in WUBLAST as well as it will realign the > > HSPs with SW. > > > > I personally never use the frac alignment summary stats for the Hit > > object for this reason unless I know I am going to have a single HSP. > > > > -jason > > > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > > > Hi all, > > > > > > This topic was discussed before, but I would like to put it on the > > > list > > > again, maybe someone has an update. > > > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > > frac_aligned_hit can also be used in the hit context, after HSP > > > tilling. In > > > my point of view, it is better to use it just in HSPs individually, > > > because > > > there are some rare/strange kinds of alignments. However, we > > > frequently need > > > to get one measure of the whole alignment. > > > > > > Any of the BioPerl masters has an update on this topic ? What is > > > the best > > > current usage ? > > > > > > Best. > > > > > > Thiago > > > > > > -- > > > "Innovation distinguishes between a leader and a follower." > > > Steve Jobs > > > > > > ======================== > > > Thiago Motta Venancio, MSc > > > PhD student in Bioinformatics > > > University of Sao Paulo > > > ======================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From vebaev at gmail.com Wed Oct 3 12:44:35 2007 From: vebaev at gmail.com (Vesselin Baev) Date: Wed, 3 Oct 2007 19:44:35 +0300 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: References: Message-ID: Hi, What methods should I use to draw a CG plot of a sequence (with bio::graphics)? Thanks -- ------------------------------------------------ University of Plovdiv Faculty of Biology Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 57-444-67 Skype: vesselin_baev vebaev at gmail.com -- ------------------------------------------------ University of Plovdiv Faculty of Biology Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 57-444-67 Skype: vesselin_baev vebaev at gmail.com From cuiw at ncbi.nlm.nih.gov Wed Oct 3 13:37:51 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 3 Oct 2007 13:37:51 -0400 Subject: [Bioperl-l] frac_* methods In-Reply-To: References: <18C407FD4FFB424292D769FBD68C198701B18C35@NIHCESMLBX8.nih.gov> Message-ID: <18C407FD4FFB424292D769FBD68C198701B18C36@NIHCESMLBX8.nih.gov> I agree what you said. One of the reasons that we introduce 'BLAST-guided-global alignment (NW)' is that a significant amount of clones are either of low quality, partially sequenced, erroneously assembled, or come from non reference strain. Wenwu Cui, PhD -----Original Message----- From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] Sent: Wednesday, October 03, 2007 11:53 AM To: Cui, Wenwu (NIH/NLM/NCBI) [C] Cc: bioperl-l list; Jason Stajich; Thiago Venancio Subject: RE: [Bioperl-l] frac_* methods I think Wenwu makes a nice distinction here between alignment and placement. BLAST is great at finding things and (thus) placing things. String matching has a long and rich history in computer science, and we tend to confuse the terms "alignment" with "matching". The "align a BAC/PAC to a genome" problem is one of string matching (with allowance for errors due to sequencing artifacts and possible SNPs); if there were no errors, we wouldn't use BLAST at all (and, in fact, I personally think programs such as MUMMER, or the various genome assembly tiling algorithms, are better for this particular problem). The problem of pairwise alignment can also be called matching, but the distinction (at least to me) is that the "errors" are true evolutionary mutations, and are expected to occur naturally (i.e. are not an artifact of the experiment that in an optimal world would not occur). BLAST is good at finding matches whose "errors" fit scoring-matrix-based evolutionary models, but it isn't very good at teasing out the actual evolutionary events that lead to those "errors" (this is not really a criticism of BLAST - it's job is not to generate evolutionarily-accurate, and -complete alignments, but to identify evolutionarily-conserved regions having statistical significance) Please don't get me wrong, I think BLAST is an invaluable tool that fully deserves its top-most place in the bioinformatics hall of fame. But I also don't believe that bioinformatics begins and ends with running a BLAST search and poring over the report details. -Aaron "Cui, Wenwu (NIH/NLM/NCBI) [C]" wrote on 10/03/2007 10:50:47 AM: > I agree that BLAST is not a very good alignment algorithm but believe > there are plenty of reasons to run BLAST, especially when placing a > contig /BAC/PAC to a genome. In those cases, fully implementation of SW > requires an unpractical matrix of n X m. > > Currently we are developing an algorithm which will run global alignment > after BLAST. Hopefully a Perl wrapper will become available next year. > > > Wenwu Cui, PhD > > -----Original Message----- > From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] > Sent: Tuesday, October 02, 2007 9:40 PM > To: Jason Stajich > Cc: bioperl-l list; Thiago Venancio > Subject: Re: [Bioperl-l] frac_* methods > > Let me second Jason's comment that while BLAST is a great search > program, > it is not a very good alignment algorithm. In this day and age with so > many good pairwise alignment algorithms out there (customized for the > context in which the alignment is performed), BLAST-based alignments > should frankly be ignored. See: exonerate, pairagon, etc. > > Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with > the > FASTA package) is now vector-parallelized on most i386 architectures, it > > is only about 10 times slower than BLAST for complete database searches > (with superior sensitivity/specificity); add PVM or MPI-based CPU > parallelization on top of that, and there's almost no reason to even run > > BLAST anymore ... > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM: > > > I think my answer before was something to the tune of: > > > > Use an alignment algorithm that finds a single best alignment like > > FASTA or Smith-Waterman (SW) if what you want is a single number that > > represents the alignment. BLAST is great for fast searching but > > FASTA or SW/SSEARCH are going to be better at creating an alignment. > > Consider the -postsw option in WUBLAST as well as it will realign the > > HSPs with SW. > > > > I personally never use the frac alignment summary stats for the Hit > > object for this reason unless I know I am going to have a single HSP. > > > > -jason > > > > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote: > > > > > Hi all, > > > > > > This topic was discussed before, but I would like to put it on the > > > list > > > again, maybe someone has an update. > > > > > > The methods frac_identical, frac_conserved, frac_aligned_query and > > > frac_aligned_hit can also be used in the hit context, after HSP > > > tilling. In > > > my point of view, it is better to use it just in HSPs individually, > > > because > > > there are some rare/strange kinds of alignments. However, we > > > frequently need > > > to get one measure of the whole alignment. > > > > > > Any of the BioPerl masters has an update on this topic ? What is > > > the best > > > current usage ? > > > > > > Best. > > > > > > Thiago > > > > > > -- > > > "Innovation distinguishes between a leader and a follower." > > > Steve Jobs > > > > > > ======================== > > > Thiago Motta Venancio, MSc > > > PhD student in Bioinformatics > > > University of Sao Paulo > > > ======================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Oct 3 14:19:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 Oct 2007 13:19:43 -0500 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: References: Message-ID: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> You should look at Bio::Graphics::Glyph::dna. From the POD: --------------------------- This glyph draws DNA sequences. At high magnifications, this glyph will draw the actual base pairs of the sequence (both strands). At low magnifications, the glyph will plot the GC content. By default, the GC calculation will use non-overlapping bins, but this can be changed by specifying the gc_window option, in which case, a sliding window calculation will be used. For this glyph to work, the feature must return a DNA sequence string in response to the dna() method. For example, you can use a Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq like this: my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800 ); $feature->attach_seq($dna); $panel->add_track( $feature, -glyph => 'dna' ); A Bio::Graphics::Feature object may also be used. --------------------------- chris On Oct 3, 2007, at 11:44 AM, Vesselin Baev wrote: > Hi, > What methods should I use to draw a CG plot of a sequence (with > bio::graphics)? > > Thanks > > -- > ------------------------------------------------ > University of Plovdiv > Faculty of Biology > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > 032/ 261 (534) > 089/ 57-444-67 > Skype: vesselin_baev > vebaev at gmail.com > > -- > ------------------------------------------------ > University of Plovdiv > Faculty of Biology > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > 032/ 261 (534) > 089/ 57-444-67 > Skype: vesselin_baev > vebaev at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From vebaev at gmail.com Wed Oct 3 14:31:49 2007 From: vebaev at gmail.com (Vesselin Baev) Date: Wed, 3 Oct 2007 21:31:49 +0300 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> References: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> Message-ID: Thanks, I will use Bio::Graphics::Glyph::dna for the classical CG% (this type is for C+G % or CpG)? and if I want to draw a similar plot but for example for a % of dinucleotide (NpN) occurrances in a sliding windiw, what should I use? Thanks! 2007/10/3, Chris Fields : > > You should look at Bio::Graphics::Glyph::dna. From the POD: > > --------------------------- > > This glyph draws DNA sequences. At high magnifications, this glyph > will draw the actual base pairs of the sequence (both strands). At > low magnifications, the glyph will plot the GC content. By default, > the GC calculation will use non-overlapping bins, but this can be > changed by specifying the gc_window option, in which case, a > sliding window calculation will be used. > > For this glyph to work, the feature must return a DNA sequence string > in response to the dna() method. For example, you can use a > Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq > like this: > my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); > my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end > => 800 ); > $feature->attach_seq($dna); > $panel->add_track( $feature, -glyph => 'dna' ); > > A Bio::Graphics::Feature object may also be used. > > --------------------------- > > chris > > On Oct 3, 2007, at 11:44 AM, Vesselin Baev wrote: > > > Hi, > > What methods should I use to draw a CG plot of a sequence (with > > bio::graphics)? > > > > Thanks > > > > -- > > ------------------------------------------------ > > University of Plovdiv > > Faculty of Biology > > Dept. Molecular Biology > > Bioinformatics Group > > Tzar Assen 24 > > Plovdiv 4000, BULGARIA > > 032/ 261 (534) > > 089/ 57-444-67 > > Skype: vesselin_baev > > vebaev at gmail.com > > > > -- > > ------------------------------------------------ > > University of Plovdiv > > Faculty of Biology > > Dept. Molecular Biology > > Bioinformatics Group > > Tzar Assen 24 > > Plovdiv 4000, BULGARIA > > 032/ 261 (534) > > 089/ 57-444-67 > > Skype: vesselin_baev > > vebaev at gmail.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- ------------------------------------------------ University of Plovdiv Faculty of Biology Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 57-444-67 Skype: vesselin_baev vebaev at gmail.com From dave at davemessina.com Wed Oct 3 14:22:23 2007 From: dave at davemessina.com (Dave Messina) Date: Wed, 3 Oct 2007 20:22:23 +0200 Subject: [Bioperl-l] CG content plot of sequence Message-ID: <37574C6D-98BA-47A5-875E-9255377133B8@sbc.su.se> Hi Vesselin, I believe what you want to use is Bio::Graphics::Panel with the Bio::Graphics::Glyph::dna glyph. See http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/ Graphics/Glyph/dna.html and http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/ Graphics/Panel.html I think the example code will help you to do what you want. Dave From cjfields at uiuc.edu Wed Oct 3 15:10:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 Oct 2007 14:10:46 -0500 Subject: [Bioperl-l] CG content plot of sequence In-Reply-To: References: <3427C150-D571-4C7B-99FC-F6FE77C9A344@uiuc.edu> Message-ID: <7B62B534-2A8C-4EAE-B956-F0FADB726195@uiuc.edu> On Oct 3, 2007, at 1:31 PM, Vesselin Baev wrote: > Thanks, > I will use Bio::Graphics::Glyph::dna for the classical CG% > (this type is for C+G % or CpG)? > > > and if I want to draw a similar plot but for example for a % of > dinucleotide > (NpN) occurrances in a sliding windiw, what should I use? > > > Thanks! It would be GC content, not CpG. Not sure what you would use for dinucleotide content; you could look at the Bio::Graphics::Glyph::dna code and either subclass it for your needs (probably the best option) or add an extra parameter and 'rewire' the appropriate methods to do what you want. chris From dmessina at sbc.su.se Wed Oct 3 14:55:10 2007 From: dmessina at sbc.su.se (dmessina at sbc.su.se) Date: Wed, 3 Oct 2007 20:55:10 +0200 (CEST) Subject: [Bioperl-l] CG content plot of sequence Message-ID: <61118.217.213.158.117.1191437710.squirrel@mail.sbc.su.se> Hi Vesselin, I believe what you want to use is Bio::Graphics::Panel with the Bio::Graphics::Glyph::dna glyph. See http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Glyph/dna.html and http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html I think the example code will help you to do what you want. Dave From lzhtom at hotmail.com Wed Oct 3 18:30:43 2007 From: lzhtom at hotmail.com (zhihuali) Date: Wed, 3 Oct 2007 22:30:43 +0000 Subject: [Bioperl-l] Loading Blast Report in a minimal way Message-ID: Hi netters, I'm using SearchIO to parse my blast reports. They are extremely huge, and not surprisingly, it's extremely slow and sometimes the system crashed due to memmory problem. As I can handle small reports quickly, it seems like a problem related to the way SearchIO works: it slurps the whole report into the memory and builds millions of objects. I've checked old posts and some people used FastHitEventBuilder to build hit objects without any hsp objects. And some people suggested using tabular output of blast. But in my case I need to go to each of the hsps of each hit, parse the alignment, and gather the information needed if that hsp fits certain criteria, and then move on to the next hsp/or jump over to the next hit/ or exit the processing, according to the information I have already got. An ideal way would be to read one hsp at a time from the report to the memory. Is there some way to modify SearchIO (or build another Search Event) to do this? Thanks a lot! Zhihua Li _________________________________________________________________ ?? Live Search ?????????????? http://www.live.com/?searchOnly=true From budd at embl-heidelberg.de Thu Oct 4 09:43:57 2007 From: budd at embl-heidelberg.de (Aidan Budd) Date: Thu, 4 Oct 2007 15:43:57 +0200 (CEST) Subject: [Bioperl-l] Adding info to Features to view in SwissProt Message-ID: Hi bioperlers, I've been trying to add info to a feature in a RichSeq object so that when the Seq is written in swissprot format I can put information in the final field of the feature FT DOMAIN 208 392 Helicase ATP-binding. i.e. where it says "Helicase ATP-binding." I can control what goes in the primary field, and set the location, but haven't been able to work out how to add info to go in this final field. Thanks, Aidan -- ---------------------------------------------------------------------- Aidan Budd, PhD tel:+49 (0)6221 387 8530 EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 Meyerhofstr. 1, 69117 Heidelberg, Germany URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html From cjfields at uiuc.edu Thu Oct 4 10:32:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 09:32:43 -0500 Subject: [Bioperl-l] Adding info to Features to view in SwissProt In-Reply-To: References: Message-ID: <84A6CB47-6890-476B-A375-6FCF37655330@uiuc.edu> Try adding it as a tag value with the name 'description','note', or 'product' (the first two are probably the best to use for most purposes). There is a quick explanation here: http://www.bioperl.org/wiki/HOWTO:Feature- Annotation#Building_Your_Own_Sequences You can also do something like: $sf->add_tag_value('description', 'Helicase ATP-binding'); See Bio::SeqFeature::Generic POD for more. chris On Oct 4, 2007, at 8:43 AM, Aidan Budd wrote: > Hi bioperlers, > > I've been trying to add info to a feature in a RichSeq object so > that when > the Seq is written in swissprot format I can put information in the > final > field of the feature > > FT DOMAIN 208 392 Helicase ATP-binding. > > i.e. where it says "Helicase ATP-binding." > > I can control what goes in the primary field, and set the location, > but > haven't been able to work out how to add info to go in this final > field. > > Thanks, > > Aidan > > -- > ---------------------------------------------------------------------- > Aidan Budd, PhD tel:+49 (0)6221 387 8530 > EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 > Meyerhofstr. 1, 69117 Heidelberg, Germany > > URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cain.cshl at gmail.com Thu Oct 4 11:08:52 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 04 Oct 2007 11:08:52 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] Fwd: DB::SeqFeature::Store error In-Reply-To: References: Message-ID: <1191510532.2787.16.camel@localhost.localdomain> Hi Chris, I think adding the type=MYISAM is the right thing to do; please go ahead and commit it. Scott On Mon, 2007-10-01 at 10:14 -0500, Chris Fields wrote: > Just thought I would forward this on to the GBrowse list as well in > case anyone has run into the same problem. The issue pops up when > using bioperl from CVS and appears to be related to a fix Lincoln > added recently in Bio::DB::SeqFeature::Store::DBI::mysql using > FULLTEXT, which only works for MyISAM currently. > > Making the suggested changes (adding TYPE=MYISAM) to the CREATE TABLE > queries does work when InnoDB is set to the default. Should I go > ahead and commit? > > chris > > Begin forwarded message: > > > I'm getting the following error on my local MySQL (v 5.0.41) with > > bp_seqfeature_load: > > > > -------------------- EXCEPTION -------------------- > > MSG: The used table type doesn't support FULLTEXT indexes > > STACK Bio::DB::SeqFeature::Store::DBI::mysql::_init_database /Library/ > > Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:414 > > STACK Bio::DB::SeqFeature::Store::init_database /Library/Perl/5.8.6/ > > Bio/DB/SeqFeature/Store.pm:382 > > STACK Bio::DB::SeqFeature::Store::DBI::mysql::init /Library/Perl/ > > 5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:218 > > STACK Bio::DB::SeqFeature::Store::new /Library/Perl/5.8.6/Bio/DB/ > > SeqFeature/Store.pm:345 > > STACK toplevel /usr/local/bin/bp_seqfeature_load.pl:57 > > ------------------------------------------- > > > > The default setting for storage is InnoDB; switching to MyISAM fixes > > the issue. Should we specify TYPE = MyISAM with the various CREATE > > TABLE queries in Bio::DB::SeqFeature::Store::DBI::mysql to be on the > > safe side? > > > > chris > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Thu Oct 4 11:14:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 10:14:37 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Fwd: DB::SeqFeature::Store error In-Reply-To: <1191510532.2787.16.camel@localhost.localdomain> References: <1191510532.2787.16.camel@localhost.localdomain> Message-ID: <9DEFB5C1-6F79-465B-8FB4-68305C53E427@uiuc.edu> Done. If we run into issues and need to roll back let me know. chris On Oct 4, 2007, at 10:08 AM, Scott Cain wrote: > Hi Chris, > > I think adding the type=MYISAM is the right thing to do; please go > ahead > and commit it. > > Scott > > > > On Mon, 2007-10-01 at 10:14 -0500, Chris Fields wrote: >> Just thought I would forward this on to the GBrowse list as well in >> case anyone has run into the same problem. The issue pops up when >> using bioperl from CVS and appears to be related to a fix Lincoln >> added recently in Bio::DB::SeqFeature::Store::DBI::mysql using >> FULLTEXT, which only works for MyISAM currently. >> >> Making the suggested changes (adding TYPE=MYISAM) to the CREATE TABLE >> queries does work when InnoDB is set to the default. Should I go >> ahead and commit? >> >> chris >> >> Begin forwarded message: >> >>> I'm getting the following error on my local MySQL (v 5.0.41) with >>> bp_seqfeature_load: >>> >>> -------------------- EXCEPTION -------------------- >>> MSG: The used table type doesn't support FULLTEXT indexes >>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::_init_database / >>> Library/ >>> Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:414 >>> STACK Bio::DB::SeqFeature::Store::init_database /Library/Perl/5.8.6/ >>> Bio/DB/SeqFeature/Store.pm:382 >>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::init /Library/Perl/ >>> 5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm:218 >>> STACK Bio::DB::SeqFeature::Store::new /Library/Perl/5.8.6/Bio/DB/ >>> SeqFeature/Store.pm:345 >>> STACK toplevel /usr/local/bin/bp_seqfeature_load.pl:57 >>> ------------------------------------------- >>> >>> The default setting for storage is InnoDB; switching to MyISAM fixes >>> the issue. Should we specify TYPE = MyISAM with the various CREATE >>> TABLE queries in Bio::DB::SeqFeature::Store::DBI::mysql to be on the >>> safe side? >>> >>> chris >> >> >> >> >> --------------------------------------------------------------------- >> ---- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a > browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Oct 4 15:30:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Oct 2007 14:30:48 -0500 Subject: [Bioperl-l] blastxml oddity Message-ID: <23DE7F77-3FB4-4E00-86CA-43B55A6A7311@uiuc.edu> Just noticed an oddity from BLASTXML output from the NCBI server; I'm cc'ing this to NCBI so maybe they can explain. BTW, the following doesn't occur via URLAPI. When running a standard BLAST query using the NCBI web page, if requesting XML output after the run I get the entire query seq masked out and no midline. This occurs with all default settings except output type (set to XML). Can anyone replicate this? Here's a sample: 1 320.472 820 2.17708e-86 1 181 1 181 0 0 0 0 0 181 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX MNQK