[BioRuby] [Wg-phyloinformatics] GSOC: phyloXML for BioRuby: Mapping sequence
Christian M Zmasek
czmasek at burnham.org
Tue Jun 9 19:18:20 UTC 2009
Hi:
Thank you for the detailed comments.
I think this is a very crucial point, since sequence and taxonomy are
the two most important elements.
At this point, I would recommend to create a special class for
phyloxml-sequence, and add methods/constructors to it which make
transferring to and from Bio::Sequence easy.
But I can definitely see the advantages of directly using Bio::Sequence,
too.
Also, please don't forget that, should a consensus/strong opinion
emerge, we could also add features to the phyloxml-sequence definition
to make it match BioRuby and BioPython sequence better.
Christian
Naohisa GOTO wrote:
> Hi,
>
> sorry for delay.
>
> On Sat, 30 May 2009 17:27:52 -0400
> Diana Jaunzeikare <rozziite at gmail.com> wrote:
>
>
>> Hi all,
>>
>> So I looked more carefully at the sequence element of phyloXML and it
>> consists of information which cannot be mapped to Bio::Sequence object. I
>> suggest to have a sequence class which closely resembles phyloXML structure
>> and then have a method to extract relevant elements return Bio::Sequence
>> object. What do you think?
>>
>
> In this case, the method to convert from Bio::Sequence to the
> phyloXML sequence class is also needed.
>
> If some of the attributes are really essential and not specific
> to phyloXML but will be needed from other data types, it is
> also possible to add new attributes to Bio::Sequence.
>
>
>> Here on the left i listed phyloXML sequence tag elements and after the arrow
>> -> the possible corresponding attribute of Bio::Sequence
>> * type
>> ** rna, dna -> Bio::Sequence::NA -> molecule type
>> ** aa -> Bio::Sequence::AA
>> * id_source (string ?) -> id_namespace
>> * id_ref (string ) -> entry_id
>>
id_source and id_ref are actually used to describe relations between
sequences, for example to describe orthology-relationships.
>> * symbol (string ?)
>> * accession
>> ** source (example: "UniProtKB") ->
>> ** id (example: "P17304") -> primary_accession
>>
** source -> id_namespace
** id -> primary_accession (or entry_id)
>> * name (string )
>> * location (string ? )
>> * mol_seq (string) -> seq / Bio::Sequence::NA/AA
>> * uri
>> ** desc (string)
>> ** type (string )
>> ** uri
>>
>> * annotation []
>> ** ref
>> ** source
>> ** evidence
>> ** type
>> ** desc
>> ** confidence
>> ** property []
>> ** uri
>>
>> * domain_architecture
>> ** length
>> ** domain []
>> *** from
>> *** to
>> *** confidence
>> *** id
>>
>
> The annotations and domain architecture could be mapped to the features
> in Bio::Sequence. But, in some cases, it is difficult to be mapped,
> depending on the vocabulary used in the annotations/domain_architecture.
>
>
More information about the BioRuby
mailing list