[BioRuby] spidey parser

Wed Nov 8 15:04:00 UTC 2006

Hi,

To get Bio::Spidey::Report::SegmentPair objects, you can use
Bio::Spidey::Report::Hit#each (iterates over each exon segment pair),
Bio::Spidey::Report::Hit#hsps (gets an array of exon segment pairs),
Bio::Spidey::Report::Hit#exons (same as hsps), or
Bio::Spidey::Report::Hit#segmentpairs (gets an array of all segment
pairs including introns) methods.

small sample code: 
--------------------------------------------------
  require 'bio'

  Bio::FlatFile.open('file.spidey') do |ff|
    ff.each do |entry|
      entry.each do |hit|
        p hit.query_def    # query=mRNA definition
        p hit.target_def   # target=genomic sequence definition
        hit.each do |hsp|
          p hsp.qseq       # query=mRNA sequence (with gaps)
          p hsp.midline    # middle line
          p hsp.hseq       # hit=genomic sequence (with gaps)
          p hsp.aaseqline  # amino acid sequence line
        end
      end
    end
  end
--------------------------------------------------

Thanks,

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
Department of Genome Informatics, Genome Information Research Center,
Research Institute for Microbial Diseases, Osaka University, Japan

On Wed, 8 Nov 2006 14:20:08 -0000
"jan aerts \(RI\)" <jan.aerts at bbsrc.ac.uk> wrote:

> All,
> 
> I'm trying to use the spidey parser (Bio::Spidey) to find mismatches in
> the sequence between mRNA and genomic sequence. For example the C/T
> mismatch at the end of the last line in the snippet below.
> 
> <SNIP>
> Exon 3: 46694102-46690011 (gen)  152-4244 (mRNA)
> 
> TATTTTGCAGATAAGTCATCATGGTGAAAAGCCACATAGGCAGTTGGATCCTGGTTCTCT
>           ||||||||||||||||||||||||||||||||||||||||||||||||||
>           ATAAGTCATCATGGTGAAAAGCCACATAGGCAGTTGGATCCTGGTTCTCT
>                V  I  M  V  K  S  H  I  G  S  W  I  L  V  L
> 
> 
> ACAGGCCAGTGGATCAGTATAGTAACCAGAACAACTTTGTGCATGACTGTGTCAACATCA
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||
> ACAGGCCAGTGGATCAGTATAGTAACCAGAACAACTTTGTGCATGACTGTGTCAATATCA
> Y  R  P  V  D  Q  Y  S  N  Q  N  N  F  V  H  D  C  V  N  I
> </SNIP>
> 
> I couldn't find out exactly how to walk through all segments of the
> alignment and get to the midline. The
> Bio::Spidey::Report::SegmentPair#initialize method in the
> bio/appl/spidey/report.rb file mentions that it "is designed to be
> called from Bio::Spidey::Report::* classes." and that "users shall not
> call it directly."
> 
> How can I walk over all segment pairs and access the mRNA and genomic
> sequences as well as the midline?
> 
> Many thanks,
> 
> Dr Jan Aerts
> Bioinformatics Group
> Roslin Institute
> Roslin, Scotland, UK
> +44 131 527 4200
> 
> ---------The obligatory disclaimer--------
> The information contained in this e-mail (including any attachments) is
> confidential and is intended for the use of the addressee only.   The
> opinions expressed within this e-mail (including any attachments) are
> the opinions of the sender and do not necessarily constitute those of
> Roslin Institute (Edinburgh) ("the Institute") unless specifically
> stated by a sender who is duly authorised to do so on behalf of the
> Institute. 
>