From heikki at ebi.ac.uk Fri Nov 21 11:20:25 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri Nov 21 19:36:44 2003 Subject: [Bioperl-announce-l] Bioperl Developer snapshot 1.3.03 Message-ID: <200311211620.25306.heikki@ebi.ac.uk> Bioperl developer snap shot 1.3.03 --------------------------------- This is the third developer snap shot from the BioPerl CVS head that will eventually lead to release 1.4. http://bioperl.org/DIST/current_core_unstable.tar.gz http://bioperl.org/DIST/bioperl-1.3.03.tar.gz Changes since 1.3.02 -------------------- A month is far too long time between snap shots, but I found it difficult to find time to write an overview of what has happend. Waiting made it harder, of course, so I'll be able to just skim the top of the changes made. See the latter pat of the message for emails. Bio::LocatableSeq now gives reasonable values to start() and end() without manually setting them if the values can be derived from the sequence only. Sequence database parsers now treat virus Bio::Species entries differently form other taxons. Since virus nomenclature does not follow the standard genus + species format, calling binomial() on viri is not advisable. The output will merge group name and species name, which is usually not what you want. This might need more work in the future. Bio::SimpleAlign has new methods. Help appreciated there too. (see below) If you really want, you can now add custom translation tables into Bio::Tools::CodonTable and create Marsian proteins. Stefan has continued finetuning his Bio::Matrix::PSM modules. Number of fixes has been added to Bio::Graphics modules. Work is under way to add SVG support. Bio::Tools::SeqWords has a new method: count_overlap_words() Remember: BPlite is getting superceded by SearchIO. On behalf of the bioperl core team, -Heikki NEW DIRECTORIES and FILES ========================= * AlignIO supports now MAF format * SeqIO knows about KEGG and TIGR formats * Bio/Tools/Analysis/Protein::ELM for documentation * two texts converted into SGML: Flat_Databases.sgml * new HOWTO: SimpleWebAnalysis.sgml * bioperl-live/doc/howto/txt - New directory for text-only versions of howtos examples * sirna/rnai_finder.cgi * db/bioflat_index.pl models * popgen.dia CHANGES ======= + Lots of fixes to tests * tests fail now cleanly when run without network access ------------------------- details --------------------------- Bio::Align::DNAStatistics code alignment formatting Bio::AlignIO::bl2seq Johnathan Segal's fixes for bug #1541 - problem with reverse complement alignments in bl2seq Bio::DB::Flat::BinarySearch More detail on secondary namespaces Bio::DB::Flat Some -index value has to be passed, it's required Bio::DB::GFF::Adaptor::biofetch changes making genbank2gff.pl use SOFA terms for type names in generated GFF3 Bio::DB::GFF::Aggregator fixed errors in the high-mag sequence alignments shown by the segments glyph Bio::DB::GFF::Feature Reworked the following methods to more closely resemble the corresponding Bio::SeqFeatureI methods: - all_tags (alias get_all_tags) - gff_string - get_tag_values - aliased sub_SeqFeature to get_SeqFeatures Bio::DB::GFF::Feature silence the uninitialized value error Bio::DB::Registry The HOWTO says that one should be able to use 1 or more seqdatabase.ini files. This is right, since the administrator could put one in /etc/bioinformatics and I might want my own in /home/bosborne/.bioinformatics. The old code was reading 1 *ini file and skipping the rest in OBDA_SEARCH_PATH, now it reads all the files specified in OBDA_SEARCH_PATH, as well as the standard locations. ActiveState has no getpwuid() so AS users can use /home/bosborne Bio::Graphics::FeatureFile - adding a symbol to access a feature's primary ID (eg, database PK) - remove unit variable warning when calling features() without arguments - fixed frend web-based feature renderer to accomodate recent changes in FeatureFile API Bio::Graphics::Glyph::diamond converted line-based outline to polygon calls Bio::Graphics::Glyph::Factory preliminary support for SVG output using GD::SVG Bio::Graphics::Glyph::graded_segments Fixed Bio::SeqFeature::Generic so that it will a Bio::Graphics::Panel preliminary support for SVG output using GD::SVG Bio::Graphics::Glyph fixed errors in the high-mag sequence alignments shown by the segments glyph Bio::Graphics::Glyph - preliminary support for SVG output using GD::SVG - polygon-based approach in filled_arrow to support SVG Bio::Graphics::Glyph::generic - generalized some code to support SVG output Bio::Graphics::Glyph::segments - added additional documentation for displaying multiple alignments with the segments glyph - fixed errors in the high-mag sequence alignments shown by the segments glyph - added a new "canonical_strand" option to the segments glyph Bio::Graphics::Glyph::graded_segments Fixed Bio::SeqFeature::Generic so that it will accept a score of 0; modified Bio::Graphics::Glyph::graded_segment so that it draws a fg box around each segment by default (can restore default behavior with -vary_fg=>1) Bio::Graphics::Glyph::triangle - more range checking on triangle glyph before fillToBorder call - try to fix GD buffer overrun in triangle glyph Bio::Graphics::Glyph::xyplot removed function-oriented GD calls for compatability with SVG output Bio::Graphics::Panel preliminary support for SVG output using GD::SVG Bio::Graphics::Pictogram support lowercase Bio::LocatableSeq - start() and end() now return undef if there is no sequence string - silence a spurious warning arising from unset strand - fixed trunc() when strand is -1. Also made end() calculate its value based on the length of the sequence and start. no need to set end() expicitely any more. - Johnathan Segal's fixes for bug #1541 - problem with reverse complement alignments in bl2seq Bio::SimpleAlign adding a parser and tests for UCSC maf (multiple alignment format) format. added a method SimpleAlign::splice_by_seq_pos to allow splicing of all sequences based on the gap locations of one sequence within the alignment. this could in principle be called repeatedly to remove all gaps from the MSA. Bio::Matrix::PSM::InstanceSite PsmHeader synopsis and doc fixes Bio::Matrix::PSM::IO::mast doc formatting fixes Bio::Matrix::PSM::SiteMatrix SiteMatrixI get/set method added to access accession_number Bio::Matrix::PSM::SiteMatrix Fixed bug Heikki pointed with the constructor when no input data for the vectors (A,G,C,T) is supplied This is still a temp solution Bio::Matrix::PSM::SiteMatrix Fixed bug Heikki pointed with the constructor when no input data for the vectors (A,G,C,T) is supplied This is still a temp solution Bio::Matrix::PSM::IO::mast sequence is unknown, but width is, so we supply it as 'NNN..' Accession number should be supplied as -accession_number Bio::Matrix::PSM::InstanceSite Bug fix: start method was overriding LocatableSeq method, and it shouldn't, fixed. Bio::Matrix::PSM::IO::transfac Throw exception if a position is not defined Bio::Matrix::PSM::IO::mast meme transfac Capitalization fixed when rearranging in new Bio::Matrix::PSM::IO::mast meme transfac Capitalization fixed when rearranging in new Bio::Matrix::PSM::InstanceSite Bug fix: start method was overriding LocatableSeq method, and it shouldn't, fixed. Bio::OntologyIO::dagflat - fixes to ontology regex to parse a greater subset of DAG-Edit files. i have tracked down the files where DAG-Edit IDs are validated: GOFlatFileAdapter.java the regex still only matches a subset of the allowed characters in an identifier. identifiers can be any non-whitespace, non ;$,:!\? characters > length 1 on either side of a : separator. i've opted to match \w+:\w+, hopefully we don't need to go beyond this. adding escape of SGML and newlines/tabs. is there a generic SGML escape module we want to add as a dependency? Bio::OntologyIO adding escape of SGML and newlines/tabs. is there a generic SGML escape module we want to add as a dependency? Bio::Ontology::Term Bio::Phenotype::OMIM::OMIMentry OMIMparser finer parse the symptoms Bio::PopGen Statistics update LD so that it will a) return an pair of values, LD and chiSQ. Also fix it so that composite_LD will calculate correctly with missing data Bio::PrimarySeqI translate() can take in a custom codon table Bio::RangeI Make it so 'disconnected_ranges' sub don't cause warnings Bio::Restriction::Analysis Apply fix for bug #1548 Bio::Root::IO - cleanup of debugging a little for uniformity - In order for rmtree() to work in cygwin Bio::SearchIO::blastxml blastxml expected and on the same line. my version of blastall puts them on different lines, which caused the parse to fail (from internal refactoring of and tags). this change fixes the bug. tests added to SearchIO.t and a test blastxml file added. Bio::SearchIO::Writer::GbrowseGFF Gbrowse now allows tstart and tend tags for alignment features to make it more like normal GFF. Bio::Seq::EncodedSeq fixed strandedness issues Bio::SeqFeature::Generic It will accept a score of 0; modified Bio::Graphics::Glyph::graded_segment so that it draws a fg box around each segment by default (can restore default behavior with -vary_fg=>1) Bio::SeqFeature::Tools::Unflattener reuses exons (eg containment graph not a tree) improved algorithm for matching mRNAs with CDSs Bio::SeqIO alternate ABI extension for newer versions of software (requested by Jan Aerts) Bio::SeqIO::swiss Bio::SeqIO::genbank Bio::SeqIO::embl resoving bugzilla #1519 1. fixed sprintf bug sometimes leading to extra space after ID tag 2. OS line output for viri now contains all the information after species name. The complex strain/abbreviation/common name list is stored in sub_species() which was previously not in use for viri. This is a hack but the (first) OS line now makes a perfect round trip. Bio::SeqUtils translate_6frames() failed on sequences where bioperl would guess that the sequence string is protein. Streamlined coding of the method to avoid guessing. Bio::SimpleAlign - offset location of new seq with features by location of original seq requested to build from. - added rudimentary key/value parsing for maf 'a' lines - run clean with -w on - cleaned up unit test spurious warnings. - bugfix in maf parser for detecting last record in file. - added functionality to trim gaps from a MSA for a given sequence to SimpleAlign. trimming allowed implementation of exporting Seq and SeqFeatures from SimpleAlign. the api here is still rough, comments appreciated. - added a method SimpleAlign::splice_by_seq_pos to allow splicing of all sequences based on the gap locations of one sequence within the alignment. this could in principle be called repeatedly to remove all gaps from the MSA. Bio::Species commented out internal calls to methods not doing anything Bio::Taxonomy clean up the rank sets Bio::Tools::BPlite::Iteration have be set to '' instead of undef - perhaps this is not entirely the best thing - are we screwing up in the parsing instead? use Bio::SearchIO instead I guess Bio::Tools::BPlite bug #1542 - improper detection of end of Query regexp Bio::Tools::CodonTable if you know what you are doing you can add custom codon table Bio::Tools::GFF - needed to move header parsing outside of next_feature, as it may be useful to handle sequences before sequence features (think database inserts). - adding support for parsing GFF ##sequence-region header lines. these are transformed into featureless Bio::LocatableSeq objects, available via the next_segment method. Bio::Tools::Phylo::PAML silenced a warning reported in bugzilla #1560 Bio::Tools::Run::StandAloneBlast Allow SearchIO to be used for all output format types now with _READMETHOD set Bio::Tools::SeqWords new method: count_overlap_words(), feature enhancement from bugzilla #1554 Bio::Tools::Signalp add the SignalP-HMM result. $feat->score; # Signal peptide probability $feat->get_tag_values('peptideProb')->[0]; # signalp peptide probability $feat->get_tag_values('anchorProb')->[0]; # signalp anchor probability /examples/biblio more biblio examples INSTALL.WIN Bug 1451, PPM3 documentation wrong scripts/Bio-DB-GFF/bp_genbank2gff.PLS changes making genbank2gff.pl use SOFA terms for type names in generated GFF3 scripts/Bio-DB-GFF/bulk_load_gff.PLS fast_load_gff.PLS pg_bulk_load_gff.PLS fixed a minor gff3 bug scripts/Bio-DB-GFF/bulk_load_gff.PLS added support for dsn strings in the form of "dbi:mysql:database=xxx;host=xxx" scripts/Bio-DB-GFF/bulk_load_gff.PLS added support for bulk loading from a local gff source to a remote db server scripts/Bio-DB-GFF/fast_load_gff.PLS added an option for setting MAX_BIN scripts/Bio-DB-GFF/bulk_load_gff.PLS pg_bulk_load_gff.PLS added option to set MAX_BIN, and updated the postgres loader to deal with gff3 (note that the gff3 stuff is completely untested though) scripts/graphics/frend.PLS Bio::Graphics::FeatureFile: remove uninit variable warning when calling features() without arguments; fixed frend web-based feature renderer to accomodate recent changes in FeatureFile API scripts/popgen/composite_LD.PLS - print with new API - fix to deal with newer API scripts/utilities/search2gff.PLS output 'match' and 'component' lines for GFF dumping From steve_chervitz at affymetrix.com Mon Nov 24 15:01:22 2003 From: steve_chervitz at affymetrix.com (Steve Chervitz) Date: Mon Nov 24 19:43:42 2003 Subject: [Bioperl-announce-l] Bioperl Developer snapshot 1.3.03 In-Reply-To: <200311211620.25306.heikki@ebi.ac.uk> References: <200311211620.25306.heikki@ebi.ac.uk> Message-ID: On Nov 21, 2003, at 8:20 AM, Heikki Lehvaslaiho wrote: > Bioperl developer snap shot 1.3.03 > --------------------------------- > > > This is the third developer snap shot from the BioPerl CVS head > that will eventually lead to release 1.4. > > http://bioperl.org/DIST/current_core_unstable.tar.gz > http://bioperl.org/DIST/bioperl-1.3.03.tar.gz Correction on the second URL: http://bioperl.org/DIST/bioperl-devel-1.3.03.tar.gz Nice work Heikki. Steve