[Bioperl-l] Gaps in sequences

Holland, Richard Richard.Holland at agresearch.co.nz
Mon Jan 5 17:03:59 EST 2004


I read the bug report and had a thought - the problem seems to be that
there is no way of representing a gap as a reproducable feature (ie. you
could either not bother annotating it because it is empty space, or you
are stuck because you can only annotate actual alignments in contigs
with actual sequences).

My (somewhat off-the-wall) idea is that you could introduce the concept
of the Empty Sequence to BioPerl, as a global static sequence of
infinite length and no content, which is used whenever gaps appear in
alignments. The parser would translate gap(400) to be EMPTY:1...400 or
something like that before processing it, and could translate
EMPTY:1...400 back to gap(400) on output. In BioSQL the Empty Sequence
would be represented as a simple bioentry being the sole member of a
special Empty biodatabase with a corresponding biosequence with null SEQ
and MAX_INT length.

Do you think that might work? I haven't thought it through in any detail
or looked at the code to see if it is even possible, but I thought I'd
mention it anyway.

cheers,
Richard

---
Richard Holland
Bioinformatics Database Developer
ITS, Agresearch Invermay x3279



-----Original Message-----
From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu] 
Sent: Tuesday, 6 January 2004 10:19 a.m.
To: Holland, Richard
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Gaps in sequences


http://bugzilla.open-bio.org/show_bug.cgi?id=1319

No one has fixed this yet AFAIK.

-jason

On Tue, 6 Jan 2004, Holland, Richard wrote:

> Hi,
>
> Some RefSeq sequences I downloaded in GenBank format from the NCBI 
> have interesting location descriptors which cause BioPerl to fail with

> the following error:
>
> -------------------- WARNING ---------------------
> MSG: exception while parsing location line 
> [join(AC145223.1:1..25742,gap(646),AC145223.1:26389..31181)] in 
> reading EMBL/GenBank/SwissProt, ignoring feature CONTIG 
> (seqid=NT_079570):
> ------------- EXCEPTION  -------------
> MSG: operator "gap" unrecognized by parser
> STACK Bio::Factory::FTLocationFactory::from_string
>
/usr/lib/perl-5.8.0/lib/site_perl/5.8.0/Bio/Factory/FTLocationFactory.pm
> :160
> STACK Bio::Factory::FTLocationFactory::from_string
>
/usr/lib/perl-5.8.0/lib/site_perl/5.8.0/Bio/Factory/FTLocationFactory.pm
> :157
> STACK (eval)
> /usr/lib/perl-5.8.0/lib/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
> STACK Bio::SeqIO::FTHelper::_generic_seqfeature
> /usr/lib/perl-5.8.0/lib/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:123
> STACK Bio::SeqIO::genbank::next_seq
> /usr/lib/perl-5.8.0/lib/site_perl/5.8.0/Bio/SeqIO/genbank.pm:421
> STACK toplevel
> /usr/users/oracle/bioperl-db/scripts/biosql/load_seqdatabase.pl:457
>
> --------------------------------------
>
> ---------------------------------------------------
>
> Any ideas anyone? Is this a broken location descriptor, or a broken 
> parser? (So far the only ones I have found all occur in
> vertebrate_mammalian3.genomic.gbff)
>
> cheers,
> Richard
>
> ---
> Richard Holland
> Bioinformatics Database Developer
> ITS, Agresearch Invermay x3279
>
>
>
> ======================================================================
> =
> Attention: The information contained in this message and/or
attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or
privileged
> material. Any review, retransmission, dissemination or other use of,
or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by
AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
>
=======================================================================
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Bioperl-l mailing list