From walsh at cenix-bioscience.com  Mon Aug  1 09:03:09 2005
From: walsh at cenix-bioscience.com (Andrew Walsh)
Date: Mon Aug  1 08:53:23 2005
Subject: [Bioperl-l] Patching lucy
In-Reply-To: <42EA40F5.3090707@purdue.edu>
References: <42EA40F5.3090707@purdue.edu>
Message-ID: <42EE1D8D.2070708@cenix-bioscience.com>

Hi Phillip,

The patch pasted at the bottom of this e-mail should do the trick.  When 
you say that lucy seg faults, I assume you mean that you get the 
segfault when running lucy on its own.  The module itself does not call 
lucy.  It is only parsing the output from the files that lucy creates. 
lucy itself should be taking phred files as its input.  The patch is 
required if you want to use the stderr from the lucy to get more 
information from the module about the sequences.  If you apply this 
patch, you can try running the test that comes with the lucy tarball 
(see the README.FIRST file in the distribution).  It works for me (Suse 
9.0 on a Pentium 3 box).  Let me know if there are any problems.  I will 
update the Appendix for Bio::Tools::Lucy in CVS.

Cheers,

Andrew


277a278,279
 >       /* AGW added next line */
 >       fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name);
588c590,592
<     if ((seqs[i].len=bases)<=0)
---
 >     if ((seqs[i].len=bases)<=0) {
 >       /* AGW added next line */
 >       fprintf(stderr, "Empty: %s\n", seqs[i].name);
589a594
 >     }
893c898,902
<       if (left) seqs[i].left+=left;
---
 >       if (left) {
 >         seqs[i].left+=left;
 >         /*  AGW added next line */
 >         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
 >       }
896c905,909
<       if (right) seqs[i].right-=right;
---
 >       if (right) {
 >         seqs[i].right-=right;
 >         /*  AGW added next line */
 >         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
 >         }
898a912,913
 >         /* AGW added next line */
 >         fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name);
949a965,966
 >         /* AGW added next line */
 >           fprintf(stderr, "Vector: %s\n", seqs[i].name);


Phillip SanMiguel wrote:
> The patch to lucy source code from (the appendix):
> 
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html
> 
> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs 
> fine, but the resulting executable (after make) seg faults when run on 
> the lucy test data.
> 
> Any advice?
> 
> I've sent email directly to the module creator, Andrew G. Walsh, as 
> requested in the module. But I'm not sure if the module creator 
> regularly monitors the hotmail account listed therein. So I thought I'd 
> post here, in case someone had a patch that would work on lucy-1.19.
> 


-- 
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Tatzberg 47
01307 Dresden
Germany
Tel. +49-351-4173 137
Fax  +49-351-4173 109

public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------

From n.haigh at sheffield.ac.uk  Mon Aug  1 10:05:14 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Mon Aug  1 09:55:24 2005
Subject: [Bioperl-l] retrieving medline citations
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAJTMeRosh9kW559hIHdXw6gEAAAAA@sheffield.ac.uk>

I know I can get medline citations using Bio::Biblio->get_by_id();

 
But can I convert the returned xml into the standard plain text format that
is used for importing into citation managers such as endnote?

 
Cheers

Nathan

 
Nathan Haigh

Bioinformatics PostDoctoral Research Associate

 
Room B2 211

Department of Animal and Plant Sciences

University of Sheffield

Western Bank

Sheffield

S10 2TN

 
Tel: +44 (0)114 22 20112

Mob: +44 (0)7742 533 569

Fax: +44 (0)114 22 20002

 
From cain at cshl.edu  Mon Aug  1 14:13:12 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug  1 14:05:19 2005
Subject: [Bioperl-l] Newbie gbrowse help - script to make gff from fasta
In-Reply-To: <1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<42E96847.1060900@ebi.ac.uk>
	<1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com>
Message-ID: <1122919992.3857.22.camel@localhost.localdomain>

On Sat, 2005-07-30 at 15:05 -0500, Jim Hu wrote:
> 1) Is there an existing script to convert a refseq fasta into a gff  
> flatfile compatible with gbrowse 1.62?
> 
>         bp_genbank2gff.pl --accession NC_001416  --stdout > lambda.gff
> 
> requires some additional tweaking/parsing as far as I can tell.  I  
> know that I'll probably eventually load these into mySQL (but for  
> phage genomes, is it worth it?), but I wanted to learn via the  
> flatfiles first.

I assume you mean genbank files, as there wouldn't be much to convert
from a fasta file.  Anyway, you should also try bp_genbank2gff3.pl.  Be
warned however, that converting genbank files to anything more stringent
like GFF3 is fiendishly difficult, and depending on the genbank file,
you may need to massage the output.
> 
> 2) Is there a repository of standard track stanzas and aggregators  
> that match the feature types generated by such scripts?

In the distribution are several example configuration files in
contrib/conf_files.
> 
> 3) Is there a FAQ I missed that I should have consulted first?

No, but there is a tutorial that comes with GBrowse that covers lots of
useful material.  You can find it at
http://localhost/gbrowse/tutorial/tutorial.html

> 
> 4) Is this even the right listserv for these questions?

Yes, and welcome!
> 
> Didn't want to reinvent any wheels if possible.  Sorry if this is off  
> topic.  Thanks!
> 
> Jim Hu
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From hu2307 at uidaho.edu  Mon Aug  1 15:00:23 2005
From: hu2307 at uidaho.edu (Xiaojun Hu)
Date: Mon Aug  1 14:50:37 2005
Subject: [Bioperl-l] ABI average singal intensity
Message-ID: <5f2c7e5efe93.5efe935f2c7e@uidaho.edu>

Hi,

Does anyone know how to get the (A T C G)average
singal intensity from ABI file?

Thank you very much!

Xiaojun Hu

From cain at cshl.edu  Mon Aug  1 15:07:36 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug  1 14:57:48 2005
Subject: [Bioperl-l] Newbie gbrowse help - script to make gff from fasta
In-Reply-To: <1122919992.3857.22.camel@localhost.localdomain>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<42E96847.1060900@ebi.ac.uk>
	<1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com>
	<1122919992.3857.22.camel@localhost.localdomain>
Message-ID: <1122923256.3857.27.camel@localhost.localdomain>

On Mon, 2005-08-01 at 14:13 -0400, Scott Cain wrote:
> 
> > 
> > 4) Is this even the right listserv for these questions?
> 
> Yes, and welcome!
> > 
Whoops!  I guess I should have looked at the list that you emailed your
questions to before I answered this one.  For some reason, I just
assumed that this was on the gbrowse mailing list, which is <gmod-
gbrowse@lists.sourceforge.net>

Scott
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From anunberg at oriongenomics.com  Mon Aug  1 15:29:50 2005
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Mon Aug  1 15:21:46 2005
Subject: [Bioperl-l] Connecting to Bio::DB::GFF db
Message-ID: <BF13E25E.6E4F%anunberg@oriongenomics.com>

When connection to a Bio::DB::GFF db, how do I specify the host ?  I would
like to connect to a db on another machine

Thanks


-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com


From cain at cshl.edu  Mon Aug  1 15:41:10 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug  1 15:35:27 2005
Subject: [Bioperl-l] Connecting to Bio::DB::GFF db
In-Reply-To: <BF13E25E.6E4F%anunberg@oriongenomics.com>
References: <BF13E25E.6E4F%anunberg@oriongenomics.com>
Message-ID: <1122925270.3857.35.camel@localhost.localdomain>

You have to use a dsn that is appropriate for your database server--that
is, the mysql one will look a little different from a postgres one, but
generally, it will look like this:

  -dsn  dbi:mysql:elegans;host=hostname;port=port_number

You can leave off port if the database server is using a standard port.


On Mon, 2005-08-01 at 14:29 -0500, Andrew Nunberg wrote:
> When connection to a Bio::DB::GFF db, how do I specify the host ?  I would
> like to connect to a db on another machine
> 
> Thanks
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From pmiguel at purdue.edu  Mon Aug  1 15:39:07 2005
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Mon Aug  1 15:35:31 2005
Subject: [Bioperl-l] Patching lucy
In-Reply-To: <42EE1D8D.2070708@cenix-bioscience.com>
References: <42EA40F5.3090707@purdue.edu>
	<42EE1D8D.2070708@cenix-bioscience.com>
Message-ID: <42EE7A5B.6050701@purdue.edu>

Hi Andrew,

Thanks for the effort you went to here. Still looks there is a (more 
minor) problem though.
patch gives a few errors (see below) using your new diff. Looks like 2 
of the 7 patches failed to patch lucy.c from lucy version lucy-1.19p.

But the resulting source code does compile and run on the lucy test 
data. But the PolyA patches did not get inserted.

Do you know if all 7 of your patches were installed into the lucy.c file 
from lucy-1.19p?

(By the way, I think we are on the same page.  I do understand that your 
perl code parses lucy output. I've tried it on lucy 1.19p output and it 
succeeds--although it, of course, lacks some of the functionality that 
would be available from the patched version of lucy).

Phillip

Here is the output when I run patch:

(lucy)% cd lucy-1.19p
(lucy-1.19p)% patch -b -i AndrewsNewPatch.diff lucy.c
  Looks like a normal diff.
Hunk #4 failed at line 893.
Hunk #5 failed at line 896.
2 out of 7 hunks failed: saving rejects to lucy.c.rej
  I can't seem to find a patch in there anywhere.

Here is the lucy.c.rej file contents:

***************
*** 893,893 ****
!       if (left) seqs[i].left+=left;
--- 898,902 ----
!       if (left) {
!         seqs[i].left+=left;
!         /*  AGW added next line */
!         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
!       }
***************
*** 896,896 ****
!       if (right) seqs[i].right-=right;
--- 905,909 ----
!       if (right) {
!         seqs[i].right-=right;
!         /*  AGW added next line */
!         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
!         }


Andrew Walsh wrote:

> Hi Phillip,
>
> The patch pasted at the bottom of this e-mail should do the trick.  
> When you say that lucy seg faults, I assume you mean that you get the 
> segfault when running lucy on its own.  The module itself does not 
> call lucy.  It is only parsing the output from the files that lucy 
> creates. lucy itself should be taking phred files as its input.  The 
> patch is required if you want to use the stderr from the lucy to get 
> more information from the module about the sequences.  If you apply 
> this patch, you can try running the test that comes with the lucy 
> tarball (see the README.FIRST file in the distribution).  It works for 
> me (Suse 9.0 on a Pentium 3 box).  Let me know if there are any 
> problems.  I will update the Appendix for Bio::Tools::Lucy in CVS.
>
> Cheers,
>
> Andrew
>
>
> 277a278,279
> >       /* AGW added next line */
> >       fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name);
> 588c590,592
> <     if ((seqs[i].len=bases)<=0)
> ---
> >     if ((seqs[i].len=bases)<=0) {
> >       /* AGW added next line */
> >       fprintf(stderr, "Empty: %s\n", seqs[i].name);
> 589a594
> >     }
> 893c898,902
> <       if (left) seqs[i].left+=left;
> ---
> >       if (left) {
> >         seqs[i].left+=left;
> >         /*  AGW added next line */
> >         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
> >       }
> 896c905,909
> <       if (right) seqs[i].right-=right;
> ---
> >       if (right) {
> >         seqs[i].right-=right;
> >         /*  AGW added next line */
> >         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
> >         }
> 898a912,913
> >         /* AGW added next line */
> >         fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name);
> 949a965,966
> >         /* AGW added next line */
> >           fprintf(stderr, "Vector: %s\n", seqs[i].name);
>
>
>
>
> Phillip SanMiguel wrote:
>
>> The patch to lucy source code from (the appendix):
>>
>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html
>>
>> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs 
>> fine, but the resulting executable (after make) seg faults when run 
>> on the lucy test data.
>>
>> Any advice?
>>
>> I've sent email directly to the module creator, Andrew G. Walsh, as 
>> requested in the module. But I'm not sure if the module creator 
>> regularly monitors the hotmail account listed therein. So I thought 
>> I'd post here, in case someone had a patch that would work on lucy-1.19.
>>
>
>

From cain at cshl.edu  Mon Aug  1 15:45:00 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug  1 15:35:53 2005
Subject: [Bioperl-l] Re: Fixing bioperl [was Re: Analysis features]
In-Reply-To: <51a02b5bd508f35301ee3c847b104895@gnf.org>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<1122650232.10455.31.camel@localhost.localdomain>
	<51a02b5bd508f35301ee3c847b104895@gnf.org>
Message-ID: <1122925500.3857.40.camel@localhost.localdomain>

On Fri, 2005-07-29 at 17:20 -0700, Hilmar Lapp wrote:
> On Jul 29, 2005, at 8:17 AM, Scott Cain wrote:
> 
> >
> > The main section of affected code in gmod is the GFF bulk loader, but
> > after we make the changes to the bioperl API, it shouldn't be too hard
> > to fix the loader.  In fact, some of those changes may have already
> > started.  I remember a few weeks before I release the gmod/chado
> > package, Hilmar sent out an announcement that he made some changes.
> 
> You mean around the time of ISMB? I fixed the ontology modules ... they  
> should actually work better now not worse unless you assumed the  
> presence of some bugs ;)

I guess I must have been assuming bugs :-)  I didn't look at diffs, or
in much detail what the exact problem was.  Since this is the last
release that will be using Bio::Onotology, and it is an alpha release, I
was not too concerned.
> 
> > While I should have paid attention then, I was busy getting my release
> > together, and everything seemed to work, so I ignored it.
> > Unfortunately, the reason things continued to work was that I forgot to
> > update my bioperl-live, and as a result, the gmod release doesn't work
> > with bioperl-live.
> 
> Scott, what would really help sometimes is if in such a situation you  
> run the bioperl test suite and report the result if there are any  
> failures, especially those that appear potentially connected to your  
> problem. Last time the gmod ontology loader ceased to work the problem  
> would have been readily exposed by the ontology tests in bioperl. It  
> just helps in zooming in on the problem.

I run make test frequently; what I do less often is pay close attention
to the result.  When working with bioperl-live, one gets a little numb
to test failures :-/
> 
> I'd be eager to help make bioperl work with gmod and vice versa and I'm  
> sure many others are too, but it'll be difficult if we don't work  
> towards this collaboratively. For this I really liked the spirit of  
> Chris' proposal - that's the way to make this work.
> 
> > [...]
> > The other section of code that could have been affected but won't be is
> > the ontology loader.  The current ontology loader depends on
> > Bio::Ontology, but I was already planning on migrating to go-perl for
> > loading ontologies anyway, so that won't be a problem.
> 
> I'm closing in on the last bugs in the go-perl integration. It remains  
> to be seen how fast the result is as Chris made me aware in Detroit,  
> but if it works this will give you both worlds at your choosing.
> 
> 	-hilmar
> 
> >
> > So, who wants to take the lead on this?
> >
> > Thanks,
> > Scott
> >
> >
> > On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote:
> >> I think the answer may be even more complicated than this.
> >>
> >> Lurkers and contributors to the bioperl mailing list may have noticed  
> >> that
> >> there has been some major obstacles in progressing lately,  
> >> particularly in
> >> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is  
> >> a
> >> developers release, though this is the one required by GMOD.
> >>
> >> My understanding is that this bottleneck can be traced back to  
> >> changes in
> >> the SeqFeature and Annotation model. These changes appear to be  
> >> required
> >> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
> >> (which in turn is used by the GMOD bulk loader, which is the main  
> >> reason
> >> GMOD requires 1.5, I believe?). Unfortunately, these changes also  
> >> break
> >> existing code and have a severe negative impact on memory usage.
> >>
> >> Before advising Cyril and others to switch to BFIO::gff I think it's
> >> important to make sure there is a clear path forward with bioperl. My
> >> impression is that there is something of a stalemate here. The bioperl
> >> developers would like to retract the aforementioned changes, but they
> >> believe they cannot do this without breaking GMOD code.  They are also
> >> extremely uncomfortable about leaving these changes in. Everyone  
> >> gives up
> >> and starts coding around bioperl.
> >>
> >> Here is why the changes were introduced:
> >>
> >> BioPerl has a 'scruffy' typing model, whereby feature types  
> >> (primary_tag
> >> in bioperl) and featureprop types (tags in bioperl) are labels or  
> >> strings.
> >> In contrast, Chado forces all types to be some class or relation in an
> >> ontology.
> >>
> >> Now obviously I'm rather partial to the Chado model, but that doesn't  
> >> mean
> >> I think it should be forced upon bioperl. I often use bioperl in  
> >> scruffy
> >> mode (on scruffy data); or in some combination whereby I map the  
> >> scruffy
> >> types to ontologies in some non-bioperl code. When using bioperl as a
> >> middleware component over a nicely organised database, ontology-typed  
> >> mode
> >> is definitely best. However, the majority of bioperl users (including
> >> myself) spend a large proportion of their time working with scruffy  
> >> data,
> >> in which case lightweight scruffy types are more appropriate.
> >>
> >> It seems that there is a perfectly simple way of reconciling both
> >> approaches. We revert bioperl back to the simpler scruffy model. The
> >> majority of users and developers breathe a sigh of relief. We then  
> >> extend
> >> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces  
> >> types to
> >> be stored as OntologyTerms (and I haven't even touched on some of the
> >> problems here, but at least we are insulating the standard bioperl  
> >> layer
> >> that 99% of users use from these issues). All classes implementing  
> >> SFAI
> >> will necessarily implement SFI, and the primary_tag and tag_values  
> >> methods
> >> will be supported (not deprecated) as simple delegations to the
> >> OntologyTerm objects.
> >>
> >> We can then modify BFIO::gff (which is an incredibly useful piece of  
> >> code)
> >> and get rid of all the dependencies on SO and Bio::Ontology* and  
> >> instead
> >> allow the user of this module to plug in their own resolver/validator  
> >> - so
> >> they can choose whether they just want fast scruffy lightweight SFI
> >> features, or whether they want ontology-typed SFAI features. If the
> >> latter, then they can choose their own resolver strategy - by a user
> >> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
> >> local chado db, by the genbank->SO mapping table, during parsing vs
> >> post-parsing, whatever. In fact there is already
> >> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly  
> >> concerned
> >> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy  
> >> genbank
> >> to something sensible.
> >>
> >> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
> >> simpler SFI. Someone can even get a stable 1.6 release out before all  
> >> the
> >> SFAI details such as how the resolver would work are finalised. I'd  
> >> really
> >> like to see 1.6 include a simpler BFIO::gff that can optionally  
> >> produces
> >> features that aren't SeqFeature::Annotateds, but that's negotiable.
> >>
> >> There's vast swathes of both GMOD and BioPerl code I'm not familiar  
> >> with,
> >> so it's possible my analysis above is flawed in some way. If it is,  
> >> then
> >> it's up to someone from either camp to speak up! If not, then there's  
> >> no
> >> excuses for the relevant people to start sorting out this mess by
> >> commencing with the solution outlined above.
> >>
> >> Cheers
> >> Chris
> >>
> >>>
> >>> Scott
> >>>
> >>>
> >>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
> >>>> Hello,
> >>>> We are going to store analysis results in chado, and we are of  
> >>>> course
> >>>> very interressed by these futur evolutions of GFF3/chado.
> >>>> So we would like to make sure that the parsers and conversions  
> >>>> programs
> >>>> we are writing now will be compatible with the futur GFF3.
> >>>>
> >>>> We are using Bio::SeqFeature::Generic objects that we write with
> >>>> Bio::Tools::GFF.
> >>>>
> >>>> Do you think that Bio::Tools::GFF will be able to handle the new  
> >>>> 'type'
> >>>> column or is it better to switch to Bio::FeatureIO::gff ?
> >>>>
> >>>> Thanks in advance for any advice.
> >>>>
> >>>> Cyril
> >>>>
> >>>> Don Gilbert wrote:
> >>>>
> >>>>>
> >>>>> Scott,
> >>>>>
> >>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
> >>>>> same direction I suggest below. More about these todo points
> >>>>>
> >>>>>> - address flybase"s use of of analysisfeature combined with  
> >>>>>> feature to
> >>>>>> give source-type information (in GFF terms). This will need to
> >>>>>> be addressed in the GBrowse adaptor.
> >>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is,
> >>>>>> containing
> >>>>>> both analysis results and annotations). See perldoc
> >>>>>> gmod_bulk_load_gff3.pl
> >>>>>> for more info
> >>>>>
> >>>>>
> >>>>> Use of chado's analysisfeature table is something others who know
> >>>>> it better can comment on. But after working with it for a while
> >>>>> it makes sense to me to use in this way:
> >>>>>
> >>>>> For a future GFF -> Chado loader, treat analysis features such as
> >>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather
> >>>>> than feature CV term type (the ones that now end up with a generic
> >>>>> 'match' cvterm). In these cases the Analysis table is populated  
> >>>>> with
> >>>>> program:database_sourcename
> >>>>> as the basis of this 'analysisfeature type', such as
> >>>>> match:blastx:na_pe.dros
> >>>>> match:sim4:DGC
> >>>>> match:genie:dummy (or maybe exon:genie)
> >>>>>
> >>>>> The program:database fits neatly in GFF source field, as
> >>>>> #ref source type start stop ...
> >>>>> chr1 blastx:na_pe.dros match 1 100 ...
> >>>>> chr1 sim4:DGC match 1 100 ...
> >>>>>
> >>>>> These can be treated in database adaptor analogously to the CVterm
> >>>>> table feature types. See at end a list of current GFF feature
> >>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a
> >>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
> >>>>> BLAT:EMBL_BEST.
> >>>>>
> >>>>> From POD of your bulk_load_gff3.pl
> >>>>>> Analysis
> >>>>>> If you are loading analysis results (ie, BLAT results, gene
> >>>>>> predictions), you should specify the -a flag. If no arguments are
> >>>>>> supplied with the -a, then the loader will assume that the results
> >>>>>> belong to an analysis set with a name that is the concatenation of
> >>>>>> the source (column 2) and the method (column 3) with an underscore
> >>>>>> in between.
> >>>>>
> >>>>> "... then the loader will assume that the results belong to an
> >>>>> analysis table row with a program name and database source name
> >>>>> taken from Source (column 2, colon separated program:sourcename),
> >>>>> with a SOFA feature type taken from Method (column 3). If
> >>>>> sourcename doesn't apply, e.g. genefinder, don't add or use  
> >>>>> 'dummy'.
> >>>>> Use the generic 'match' SOFA type if others don't apply."
> >>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
> >>>>>
> >>>>> Note that sourcename of database is a common attribute (all those
> >>>>> blasts, blats, sim4, ... are run on several different databases).
> >>>>>
> >>>>> For that underscore between method and source, where does that go  
> >>>>> into
> >>>>> database? It is used as parts of program or database sourcename  
> >>>>> names,
> >>>>> so it may be problematic to add one if not needed.
> >>>>>
> >>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name'  
> >>>>> entry
> >>>>> for analysis table. This probably is less useful than using Program
> >>>>> and Sourcename fields as flybase does, which comes from the common
> >>>>> usage where people run various programs, with various database  
> >>>>> sources
> >>>>> and want to plop the results into a database easily. These go into  
> >>>>> those
> >>>>> two fields directly, no need to create or parse a Name entry
> >>>>> (which can be and is null in flybase data).
> >>>>>
> >>>>>> my $search_analysis
> >>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
> >>>>>
> >>>>> I think it would be better as
> >>>>> my $search_analysis
> >>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=?  
> >>>>> and
> >>>>> sourcename=?");
> >>>>>
> >>>>>> Otherwise, the argument provided with -a will be taken
> >>>>>> as the name of the analysis set. Either way, the analysis set must
> >>>>>> already be in the analysis table. The easist way to do this is to
> >>>>>> insert it directly in the psql shell:
> >>>>>>
> >>>>>> INSERT INTO analysis (name, program, programversion)
> >>>>>> VALUES ('genscan 2005-2-28','genscan','5.4');
> >>>>>
> >>>>> My choice would be to populate the analysis table from GFF data,  
> >>>>> rather
> >>>>> than expect prepraration by user (or as another option).
> >>>>>
> >>>>> INSERT INTO analysis (program, sourcename)
> >>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
> >>>>> INSERT INTO analysis (program, sourcename)
> >>>>> VALUES ('sim4','na_gb.dmel');
> >>>>> INSERT INTO analysis (program, sourcename, programversion)
> >>>>> VALUES ('genie_masked','dummy', '1.0');
> >>>>>
> >>>>>> There are other columns in the analysis table that are optional;  
> >>>>>> see
> >>>>>> the schema documentation and '\d analysis' in psql for more
> >>>>>> information.
> >>>>>>
> >>>>> ....
> >>>>>> A planned addtion to the functionality of handling analysis  
> >>>>>> results
> >>>>>> is to allow "mixed" GFF files, where some lines are analysis  
> >>>>>> results
> >>>>>> and some are not.
> >>>>>
> >>>>> This is the case for drosophila GFF now (see others also below). If
> >>>>> you make the default assumption that if ($method =~ /.*match/) and
> >>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of
> >>>>> analysisfeature types, and probably not anything else.
> >>>>>
> >>>>>> Additionally, one will be able to supply lists of
> >>>>>> types (optionally with sources) and their associated entry in the
> >>>>>> analysis table. The format will probably be tag value pairs:
> >>>>>>
> >>>>>> --analysis match:Rice_est=rice_est_blast, \
> >>>>>> match:Maize_cDNA=maize_cdna_blast, \
> >>>>>> mRNA=genscan_prediction,exon=genscan_prediction
> >>>>>
> >>>>> My suggestion for this (as per GFF source,type columns) would be
> >>>>> --analysis match:program:sourcename ...
> >>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
> >>>>> mRNA:genscan:dummy, exon:genscan:dummy
> >>>>>
> >>>>> I guess the 'dummy' data sourcename need not be added; flybase  
> >>>>> uses it
> >>>>> to keep that field not-null, but it isn't required by the schema.
> >>>>>
> >>>>> Here are some snippets from the ChadoFC adaptor I modified
> >>>>> from yours (will get into cvs.sf.net 'real soon'), showing that
> >>>>> it isn't much work to add this as an analog to how cvterm types
> >>>>> are used.
> >>>>>
> >>>>> -- Don
> >>>>>
> >>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
> >>>>> ## treat similar to CV table types
> >>>>>
> >>>>> sub getAnalysisFeatureHash
> >>>>> {
> >>>>> my $self= shift;
> >>>>>
> >>>>> my $dbh= $self->dbh();
> >>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from
> >>>>> analysis")
> >>>>> or warn "unable to prepare select cvterms";
> >>>>> $sth->execute or $self->throw("unable to select cvterms");
> >>>>>
> >>>>> my(%term2name,%name2term) = ({},{});
> >>>>>
> >>>>> while (my $hashref = $sth->fetchrow_hashref) {
> >>>>>
> >>>>> ## this is dgg syntax of analysis feature names for GFF
> >>>>> ## all have generic 'match' method and program:source as 'source'
> >>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie ..  
> >>>>> etc.
> >>>>> my $anfeat=  
> >>>>> "match:".$hashref->{program}.":".$hashref->{sourcename};
> >>>>>
> >>>>> $term2name{ $hashref->{analysis_id} } = $anfeat;
> >>>>> $name2term{ $anfeat } = $hashref->{analysis_id};
> >>>>> }
> >>>>> $self->an_term2name(\%term2name);
> >>>>> $self->an_name2term(\%name2term);
> >>>>> }
> >>>>>
> >>>>> ## Das::ChadoFC::Segment snippets
> >>>>> sub features {
> >>>>> $self->{has_anatype}=0;
> >>>>> my $sql_range = '';
> >>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types);
> >>>>> unless ($feature_id) {
> >>>>> $sql_range = $self->sql_range($rangetype);
> >>>>>
> >>>>> $sql_types = $self->sql_types($types, -1); # dgg
> >>>>>
> >>>>> $srcfeature_id = $self->{srcfeature_id};
> >>>>> }
> >>>>> ...
> >>>>> elsif($self->{has_anatype}) {
> >>>>> $from_part .= "left join analysisfeature af using (feature_id) ";
> >>>>> }
> >>>>>
> >>>>>
> >>>>> sub sql_types
> >>>>> ..
> >>>>> $valid_type = $factory->name2term($temp_type);
> >>>>> $is_anatype= 0;
> >>>>> unless ($valid_type) {
> >>>>> $valid_type = $factory->an_name2term($temp_type);
> >>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
> >>>>> }
> >>>>> ..
> >>>>> ## leave out extra invalid types
> >>>>> if (!$valid_type) {
> >>>>> ### skip
> >>>>> } elsif ($temp_dbxref) {
> >>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
> >>>>> $temp_dbxref)";
> >>>>> } elsif($is_anatype) {
> >>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
> >>>>> } else {
> >>>>> $sql_types .= $orsql."(f.type_id = $valid_type)";
> >>>>> }
> >>>>>
> >>>>>
> >>>>> Lists of GFF feature type:source from some current MOD data
> >>>>> where * are probably analysisfeature types (program:database)
> >>>>>
> >>>>> rice gff type:source
> >>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ 
> >>>>> sequence_annotation/
> >>>>> gff3/
> >>>>> --------------------
> >>>>> CDS:known
> >>>>> CDS:tigr
> >>>>> EST:cmap
> >>>>> EST_match:Barley (? might be EST_match:someprogram:Barley)
> >>>>> EST_match:Maize
> >>>>> EST_match:Millet
> >>>>> EST_match:Rice
> >>>>> EST_match:Sorghum
> >>>>> EST_match:Wheat
> >>>>> cDNA_match:Rice
> >>>>> cross_genome_match:Maize
> >>>>> cross_genome_match:Rice
> >>>>> cross_genome_match:Sorghum
> >>>>> * exon:FgenesH:Monocot
> >>>>> exon:known
> >>>>> exon:tigr
> >>>>> five_prime_UTR:tigr
> >>>>> gene:known
> >>>>> gene:tigr
> >>>>> * mRNA:FgenesH:Monocot
> >>>>> mRNA:known
> >>>>> mRNA:tigr
> >>>>> microsatellite:cmap
> >>>>> three_prime_UTR:known
> >>>>> three_prime_UTR:tigr
> >>>>> transposable_element_insertion_site:cmap
> >>>>>
> >>>>> worm gff type:source
> >>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
> >>>>> genome_feature_tables/GFF3/
> >>>>> ----------------------
> >>>>> CDS:Coding_transcript
> >>>>> * CDS:Genefinder
> >>>>> CDS:Transposon_CDS
> >>>>> CDS:history
> >>>>> * CDS:twinscan
> >>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
> >>>>> * EST_match:BLAT_EST_OTHER
> >>>>> PCR_product:GenePair_STS
> >>>>> PCR_product:Orfeome
> >>>>> RNAi_reagent:RNAi_primary
> >>>>> RNAi_reagent:RNAi_secondary
> >>>>> SNP:Allele
> >>>>> binding_site:binding_site
> >>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
> >>>>> * cDNA_match:BLAT_mRNA_OTHER
> >>>>> clone_end:.
> >>>>> clone_start:.
> >>>>> complex_substitution :Allele
> >>>>> deletion:Allele
> >>>>> exon:Coding_transcript
> >>>>> * exon:Genefinder
> >>>>> exon:Non_coding_transcript
> >>>>> exon:Pseudogene
> >>>>> exon:Transposon_CDS
> >>>>> exon:history
> >>>>> exon:miRNA
> >>>>> exon:rRNA
> >>>>> exon:scRNA
> >>>>> exon:snRNA
> >>>>> exon:snoRNA
> >>>>> exon:tRNA
> >>>>> * exon:tRNAscan-SE-1.23
> >>>>> * exon:twinscan
> >>>>> experimental_result_region:Expr_profile
> >>>>> experimental_result_region:cDNA_for_RNAi
> >>>>> * expressed_sequence_match:BLAT_OST_BEST (~
> >>>>> expressed_sequence_match:BLAT:OST_BEST )
> >>>>> * expressed_sequence_match:BLAT_OST_OTHER
> >>>>> five_prime_UTR:Coding_transcript
> >>>>> gene:Coding_transcript
> >>>>> gene:gene
> >>>>> gene:history
> >>>>> gene:landmark
> >>>>> insertion:Allele
> >>>>> inverted_repeat:inverted
> >>>>> mRNA:Coding_transcript
> >>>>> * mRNA:Genefinder
> >>>>> mRNA:Transposon_CDS
> >>>>> mRNA:history
> >>>>> * mRNA:twinscan
> >>>>> miRNA:miRNA
> >>>>> nc_primary_transcript:Non_coding_transcript
> >>>>> * nucleotide_match:BLAT_EMBL_BEST (~  
> >>>>> nucleotide_match:BLAT:EMBL_BEST )
> >>>>> * nucleotide_match:BLAT_EMBL_OTHER
> >>>>> * nucleotide_match:BLAT_TC1_BEST
> >>>>> * nucleotide_match:BLAT_TC1_OTHER
> >>>>> * nucleotide_match:BLAT_ncRNA_BEST
> >>>>> * nucleotide_match:BLAT_ncRNA_OTHER
> >>>>> * nucleotide_match:TEC_RED
> >>>>> * nucleotide_match:waba_coding
> >>>>> * nucleotide_match:waba_strong
> >>>>> * nucleotide_match:waba_weak
> >>>>> oligo:.
> >>>>> operon:operon
> >>>>> polyA_signal_sequence:polyA_signal_sequence
> >>>>> polyA_site:polyA_site
> >>>>> processed_transcript:gene
> >>>>> protein_coding_primary_transcript:Coding_transcript
> >>>>> * protein_match:wublastx
> >>>>> pseudogene:Pseudogene
> >>>>> pseudogene:history
> >>>>> rRNA:rRNA
> >>>>> reagent:Oligo_set
> >>>>> region:.
> >>>>> region:Genbank
> >>>>> region:Genomic_canonical
> >>>>> region:Link
> >>>>> * repeat_region:RepeatMasker
> >>>>> scRNA:scRNA
> >>>>> sequence_variant:.
> >>>>> sequence_variant:Allele
> >>>>> snRNA:snRNA
> >>>>> snoRNA:snoRNA
> >>>>> substitution:Allele
> >>>>> tRNA:tRNA
> >>>>> * tRNA:tRNAscan-SE-1.23
> >>>>> tandem_repeat:tandem
> >>>>> three_prime_UTR:Coding_transcript
> >>>>> trans_splice_acceptor_site:SL1
> >>>>> trans_splice_acceptor_site:SL2
> >>>>> transcript:SAGE_transcript
> >>>>> * translated_nucleotide_match:BLAT_NEMATODE (~
> >>>>> translated_nucleotide_match:BLAT:NEMATODE )
> >>>>> transposable_element:Transposon
> >>>>> transposable_element:Transposon_CDS
> >>>>> transposable_element_insertion_site:Allele
> >>>>> transposable_element_insertion_site:Mos_insertion_allele
> >>>>>
> >>>>>
> >>>>> fly gff type:source
> >>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/
> >>>>> -----------------------
> >>>>> BAC:.
> >>>>> CDS:.
> >>>>> aberration_junction:.
> >>>>> chromosome:.
> >>>>> chromosome_arm:.
> >>>>> chromosome_band:.
> >>>>> enhancer:.
> >>>>> exon:.
> >>>>> five_prime_UTR:.
> >>>>> gene:.
> >>>>> insertion_site:.
> >>>>> intron:.
> >>>>> mRNA:.
> >>>>> * match:RNAiHDP
> >>>>> * match:assembly:path
> >>>>> * match:blastx:aa_SPTR.dmel
> >>>>> * match:blastx:aa_SPTR.insect
> >>>>> * match:blastx:aa_SPTR.othinv
> >>>>> * match:blastx:aa_SPTR.othvert
> >>>>> * match:blastx:aa_SPTR.plant
> >>>>> * match:blastx:aa_SPTR.primate
> >>>>> * match:blastx:aa_SPTR.rodent
> >>>>> * match:blastx:aa_SPTR.worm
> >>>>> * match:blastx:aa_SPTR.yeast
> >>>>> * match:genscan
> >>>>> * match:repeatmasker
> >>>>> * match:sim4:na_ARGs.dros
> >>>>> * match:sim4:na_ARGsCDS.dros
> >>>>> * match:sim4:na_DGC_dros
> >>>>> * match:sim4:na_dbEST.diff.dmel
> >>>>> * match:sim4:na_dbEST.same.dmel
> >>>>> * match:sim4:na_gadfly_dmel_r2
> >>>>> * match:sim4:na_gb.dmel
> >>>>> * match:sim4:na_gb.tpa.dmel
> >>>>> * match:sim4:na_smallRNA.dros
> >>>>> * match:sim4:na_transcript_dmel_r31
> >>>>> * match:sim4:na_transcript_dmel_r32
> >>>>> * match:tRNAscan-SE:.
> >>>>> * match:tblastx:na_agambiae
> >>>>> * match:tblastx:na_dbEST.insect
> >>>>> * match:tblastx:na_dpse
> >>>>> * match_part:RNAiHDP
> >>>>> * match_part:assembly:path
> >>>>> * match_part:blastx:aa_SPTR.dmel
> >>>>> * match_part:blastx:aa_SPTR.insect
> >>>>> * match_part:blastx:aa_SPTR.othinv
> >>>>> * match_part:blastx:aa_SPTR.othvert
> >>>>> * match_part:blastx:aa_SPTR.plant
> >>>>> * match_part:blastx:aa_SPTR.primate
> >>>>> * match_part:blastx:aa_SPTR.rodent
> >>>>> * match_part:blastx:aa_SPTR.worm
> >>>>> * match_part:blastx:aa_SPTR.yeast
> >>>>> * match_part:genscan
> >>>>> * match_part:repeatmasker
> >>>>> * match_part:sim4:na_ARGs.dros
> >>>>> * match_part:sim4:na_ARGsCDS.dros
> >>>>> * match_part:sim4:na_DGC_dros
> >>>>> * match_part:sim4:na_dbEST.diff.dmel
> >>>>> * match_part:sim4:na_dbEST.same.dmel
> >>>>> * match_part:sim4:na_gadfly_dmel_r2
> >>>>> * match_part:sim4:na_gb.dmel
> >>>>> * match_part:sim4:na_gb.tpa.dmel
> >>>>> * match_part:sim4:na_smallRNA.dros
> >>>>> * match_part:sim4:na_transcript_dmel_r31
> >>>>> * match_part:sim4:na_transcript_dmel_r32
> >>>>> * match_part:tRNAscan-SE:.
> >>>>> * match_part:tblastx:na_agambiae
> >>>>> * match_part:tblastx:na_dbEST.insect
> >>>>> * match_part:tblastx:na_dpse
> >>>>> mature_peptide:.
> >>>>> ncRNA:.
> >>>>> oligo:.
> >>>>> point_mutation:.
> >>>>> polyA_site:.
> >>>>> protein_binding_site:.
> >>>>> pseudogene:.
> >>>>> region:.
> >>>>> regulatory_region:.
> >>>>> rescue_fragment:.
> >>>>> scaffold:.
> >>>>> sequence_variant:.
> >>>>> snRNA:.
> >>>>> snoRNA:.
> >>>>> tRNA:.
> >>>>> three_prime_UTR:.
> >>>>> transcription_start_site:.
> >>>>> transposable_element:.
> >>>>> transposable_element_insertion_site:. 3116
> >>>>>
> >>>>>
> >>>>> yeast gff type:source count
> >>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
> >>>>> chromosomal_feature/saccharomyces_cerevisiae.gff
> >>>>> -------------------------
> >>>>> ARS:SGD
> >>>>> CDS:SGD
> >>>>> binding_site:SGD
> >>>>> centromere:SGD
> >>>>> chromosome:SGD
> >>>>> gene:SGD
> >>>>> insertion:SGD
> >>>>> intron:SGD
> >>>>> ncRNA:SGD
> >>>>> nc_primary_transcript:SGD
> >>>>> nucleotide_match:SGD
> >>>>> pseudogene:SGD
> >>>>> rRNA:SGD
> >>>>> region:SGD
> >>>>> region:landmark
> >>>>> repeat_family:SGD
> >>>>> repeat_region:SGD
> >>>>> snRNA:SGD
> >>>>> snoRNA:SGD
> >>>>> tRNA:SGD
> >>>>> telomere:SGD
> >>>>> transposable_element:SGD
> >>>>> transposable_element_gene:SGD
> >>>>>
> >>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> >>>>> -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/
> >>>>>
> >>>>>
> >>>>>
> >>>>> -------------------------------------------------------
> >>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar
> >>>>> happening
> >>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest  
> >>>>> in dual
> >>>>> core and dual graphics technology at this free one hour event  
> >>>>> hosted
> >>>>> by HP, AMD, and NVIDIA. To register visit
> >>>>> http://www.hp.com/go/dualwebinar
> >>>>> _______________________________________________
> >>>>> Gmod-gbrowse mailing list
> >>>>> Gmod-gbrowse@lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>>>>
> >>>>
> >>>>
> >>> --
> >>> --------------------------------------------------------------------- 
> >>> ---
> >>> Scott Cain, Ph. D.                                          
> >>> cain@cshl.edu
> >>> GMOD Coordinator (http://www.gmod.org/)                      
> >>> 216-392-3087
> >>> Cold Spring Harbor Laboratory
> >>>
> >>>
> >>>
> >>> -------------------------------------------------------
> >>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
> >>> September
> >>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> >>> Agile & Plan-Driven Development * Managing Projects & Teams *  
> >>> Testing & QA
> >>> Security * Process Improvement & Measurement *  
> >>> http://www.sqe.com/bsce5sf
> >>> _______________________________________________
> >>> Gmod-devel mailing list
> >>> Gmod-devel@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
> >>>
> >>
> >>
> >>
> >>
> >> -------------------------------------------------------
> >> SF.Net email is Sponsored by the Better Software Conference & EXPO  
> >> September
> >> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> >> Agile & Plan-Driven Development * Managing Projects & Teams * Testing  
> >> & QA
> >> Security * Process Improvement & Measurement *  
> >> http://www.sqe.com/bsce5sf
> >> _______________________________________________
> >> Gmod-devel mailing list
> >> Gmod-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/gmod-devel
> > -- 
> > ----------------------------------------------------------------------- 
> > -
> > Scott Cain, Ph. D.                                          
> > cain@cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From hlapp at gnf.org  Mon Aug  1 15:53:05 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Aug  1 15:42:13 2005
Subject: [Bioperl-l] Re: Fixing bioperl [was Re: Analysis features]
In-Reply-To: <1122925500.3857.40.camel@localhost.localdomain>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<1122650232.10455.31.camel@localhost.localdomain>
	<51a02b5bd508f35301ee3c847b104895@gnf.org>
	<1122925500.3857.40.camel@localhost.localdomain>
Message-ID: <2aae0a4129cb2c7407df5834b94f41aa@gnf.org>


On Aug 1, 2005, at 12:45 PM, Scott Cain wrote:

> I run make test frequently; what I do less often is pay close attention
> to the result.  When working with bioperl-live, one gets a little numb
> to test failures :-/

I know, and it's not a good situation.

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From jason.stajich at duke.edu  Mon Aug  1 22:31:12 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug  1 22:22:04 2005
Subject: [Bioperl-l] all tests pass [was Re: Fixing bioperl] [was Re:
	Analysis features]
In-Reply-To: <2aae0a4129cb2c7407df5834b94f41aa@gnf.org>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<1122650232.10455.31.camel@localhost.localdomain>
	<51a02b5bd508f35301ee3c847b104895@gnf.org>
	<1122925500.3857.40.camel@localhost.localdomain>
	<2aae0a4129cb2c7407df5834b94f41aa@gnf.org>
Message-ID: <2bf4b9070ab5bb61b34e15d3ae611044@duke.edu>

I'm getting all tests passing for me on OSX and a few different linux 
machines with different complements of aux modules installed.  I fixed 
some minor things that were breaking.

We want to setup a nightly 'make test' cronjob on one of the obf 
servers -- just need someone to have enough time to do it...  there are 
a lot of different subset of aux modules installed + perl version + OS 
combos to try out so we need to know what is breaking if it is.

I was really hoping someone would step up to push 1.5.1 out which is 
just a release off the main trunk and then think about a schedule for 
1.6.  Can anyone help outline what must get fixed for 1.6 so there can 
be a checklist that people can help on (and to know when we are ready 
to release).  I guess ideally this would be done on a wiki, but mailing 
list can suffice too.

-jason
On Aug 1, 2005, at 3:53 PM, Hilmar Lapp wrote:

>
> On Aug 1, 2005, at 12:45 PM, Scott Cain wrote:
>
>> I run make test frequently; what I do less often is pay close 
>> attention
>> to the result.  When working with bioperl-live, one gets a little numb
>> to test failures :-/
>
> I know, and it's not a good situation.
>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu

From hase at umbc.edu  Mon Aug  1 23:27:08 2005
From: hase at umbc.edu (HASE)
Date: Mon Aug  1 23:17:30 2005
Subject: [Bioperl-l] Bioinformatics Software Development Survey
Message-ID: <2008.68.49.173.177.1122953228.squirrel@68.49.173.177>

Hello,

As part of our research at UMBC, we are studying the characteristics of
software development in the bioinformatics domain. We believe that this
study should be guided by the people who are actively involved in
bioinformatics.

This research is our first step towards enabling the production of high
quality bioinformatics software with less time and effort. Therefore, your
feedback is very important to us.

We seek your input in the form of a survey questionnaire that will take
around 15 minutes of your time. We solicit general demographic
information, information about the products that you have developed, your
work practices, and your software development process.

So, if you are a bioinformatics professional doing software
development or a software developer working in the bioinformatics domain,
please provide us with your valuable input. We assure you that this
information will be used only for academic purposes and will be completely
confidential.

Please follow the link below to start the survey:
http://www.is.umbc.edu/bio-survey/

We appreciate your participation in advance.

Regards,
HASE (Human Aspects of Software Engineering)
1000 Hilltop Circle
Department of Information Systems
University of Maryland Baltimore County
Baltimore, MD, 21250
hase@umbc.edu


From walsh at cenix-bioscience.com  Tue Aug  2 03:04:27 2005
From: walsh at cenix-bioscience.com (Andrew Walsh)
Date: Tue Aug  2 02:54:44 2005
Subject: [Bioperl-l] Patching lucy
In-Reply-To: <42EE7A5B.6050701@purdue.edu>
References: <42EA40F5.3090707@purdue.edu>
	<42EE1D8D.2070708@cenix-bioscience.com>
	<42EE7A5B.6050701@purdue.edu>
Message-ID: <42EF1AFB.5010001@cenix-bioscience.com>

Hi Phillip,

I ran the patch on version 1.19p (which I downloaded from the TIGR ftp 
site yesterday).  It seemed to work for me (all 7 patches worked).

 > patch -b -i lucy.patch lucy.c
patching file lucy.c

Here are the contents of the patch file.  Perhaps my mail client did 
something funny in formatting this.  I'll send you a separate file as an 
attachment as well.

 > cat lucy.patch
277a278,279
 >       /* AGW added next line */
 >       fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name);
588c590,592
<     if ((seqs[i].len=bases)<=0)
---
 >     if ((seqs[i].len=bases)<=0) {
 >       /* AGW added next line */
 >       fprintf(stderr, "Empty: %s\n", seqs[i].name);
589a594
 >     }
893c898,902
<       if (left) seqs[i].left+=left;
---
 >       if (left) {
 >         seqs[i].left+=left;
 >         /*  AGW added next line */
 >         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
 >       }
896c905,909
<       if (right) seqs[i].right-=right;
---
 >       if (right) {
 >         seqs[i].right-=right;
 >         /*  AGW added next line */
 >         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
 >         }
898a912,913
 >         /* AGW added next line */
 >         fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name);
949a965,966
 >         /* AGW added next line */
 >           fprintf(stderr, "Vector: %s\n", seqs[i].name);


Cheers,

Andrew


Phillip San Miguel wrote:
> Hi Andrew,
> 
> Thanks for the effort you went to here. Still looks there is a (more 
> minor) problem though.
> patch gives a few errors (see below) using your new diff. Looks like 2 
> of the 7 patches failed to patch lucy.c from lucy version lucy-1.19p.
> 
> But the resulting source code does compile and run on the lucy test 
> data. But the PolyA patches did not get inserted.
> 
> Do you know if all 7 of your patches were installed into the lucy.c file 
> from lucy-1.19p?
> 
> (By the way, I think we are on the same page.  I do understand that your 
> perl code parses lucy output. I've tried it on lucy 1.19p output and it 
> succeeds--although it, of course, lacks some of the functionality that 
> would be available from the patched version of lucy).
> 
> Phillip
> 
> Here is the output when I run patch:
> 
> (lucy)% cd lucy-1.19p
> (lucy-1.19p)% patch -b -i AndrewsNewPatch.diff lucy.c
>  Looks like a normal diff.
> Hunk #4 failed at line 893.
> Hunk #5 failed at line 896.
> 2 out of 7 hunks failed: saving rejects to lucy.c.rej
>  I can't seem to find a patch in there anywhere.
> 
> Here is the lucy.c.rej file contents:
> 
> ***************
> *** 893,893 ****
> !       if (left) seqs[i].left+=left;
> --- 898,902 ----
> !       if (left) {
> !         seqs[i].left+=left;
> !         /*  AGW added next line */
> !         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
> !       }
> ***************
> *** 896,896 ****
> !       if (right) seqs[i].right-=right;
> --- 905,909 ----
> !       if (right) {
> !         seqs[i].right-=right;
> !         /*  AGW added next line */
> !         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
> !         }
> 
> 
> Andrew Walsh wrote:
> 
>> Hi Phillip,
>>
>> The patch pasted at the bottom of this e-mail should do the trick.  
>> When you say that lucy seg faults, I assume you mean that you get the 
>> segfault when running lucy on its own.  The module itself does not 
>> call lucy.  It is only parsing the output from the files that lucy 
>> creates. lucy itself should be taking phred files as its input.  The 
>> patch is required if you want to use the stderr from the lucy to get 
>> more information from the module about the sequences.  If you apply 
>> this patch, you can try running the test that comes with the lucy 
>> tarball (see the README.FIRST file in the distribution).  It works for 
>> me (Suse 9.0 on a Pentium 3 box).  Let me know if there are any 
>> problems.  I will update the Appendix for Bio::Tools::Lucy in CVS.
>>
>> Cheers,
>>
>> Andrew
>>
>>
>> 277a278,279
>> >       /* AGW added next line */
>> >       fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name);
>> 588c590,592
>> <     if ((seqs[i].len=bases)<=0)
>> ---
>> >     if ((seqs[i].len=bases)<=0) {
>> >       /* AGW added next line */
>> >       fprintf(stderr, "Empty: %s\n", seqs[i].name);
>> 589a594
>> >     }
>> 893c898,902
>> <       if (left) seqs[i].left+=left;
>> ---
>> >       if (left) {
>> >         seqs[i].left+=left;
>> >         /*  AGW added next line */
>> >         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
>> >       }
>> 896c905,909
>> <       if (right) seqs[i].right-=right;
>> ---
>> >       if (right) {
>> >         seqs[i].right-=right;
>> >         /*  AGW added next line */
>> >         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
>> >         }
>> 898a912,913
>> >         /* AGW added next line */
>> >         fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name);
>> 949a965,966
>> >         /* AGW added next line */
>> >           fprintf(stderr, "Vector: %s\n", seqs[i].name);
>>
>>
>>
>>
>> Phillip SanMiguel wrote:
>>
>>> The patch to lucy source code from (the appendix):
>>>
>>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html
>>>
>>> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs 
>>> fine, but the resulting executable (after make) seg faults when run 
>>> on the lucy test data.
>>>
>>> Any advice?
>>>
>>> I've sent email directly to the module creator, Andrew G. Walsh, as 
>>> requested in the module. But I'm not sure if the module creator 
>>> regularly monitors the hotmail account listed therein. So I thought 
>>> I'd post here, in case someone had a patch that would work on lucy-1.19.
>>>
>>
>>
> 


-- 
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Tatzberg 47
01307 Dresden
Germany
Tel. +49-351-4173 137
Fax  +49-351-4173 109

public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------

From pmiguel at purdue.edu  Tue Aug  2 10:48:48 2005
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Tue Aug  2 10:40:17 2005
Subject: [Bioperl-l] Patching lucy
In-Reply-To: <42EF1AFB.5010001@cenix-bioscience.com>
References: <42EA40F5.3090707@purdue.edu>	<42EE1D8D.2070708@cenix-bioscience.com>	<42EE7A5B.6050701@purdue.edu>
	<42EF1AFB.5010001@cenix-bioscience.com>
Message-ID: <42EF87D0.50104@purdue.edu>

Andrew,
Yes you are right. Everything looks good now.
A good test (of lucy) was to take the suggested lucy test from 
README.FIRST and add the "-c" parameter to it after patching the source 
and compiling. The test would be:

lucy -c -v PUC19 PUC19splice atie.seq atie.qul atie.2nd -debug lucy.info

The the output to STDERR shows all the extra information your patches 
have caused lucy to include.
Thanks!
Phillip

Andrew Walsh wrote:

> Hi Phillip,
>
> I ran the patch on version 1.19p (which I downloaded from the TIGR ftp 
> site yesterday).  It seemed to work for me (all 7 patches worked).
>
> > patch -b -i lucy.patch lucy.c
> patching file lucy.c
>
> Here are the contents of the patch file.  Perhaps my mail client did 
> something funny in formatting this.  I'll send you a separate file as 
> an attachment as well.
>
> > cat lucy.patch
> 277a278,279
> >       /* AGW added next line */
> >       fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name);
> 588c590,592
> <     if ((seqs[i].len=bases)<=0)
> ---
> >     if ((seqs[i].len=bases)<=0) {
> >       /* AGW added next line */
> >       fprintf(stderr, "Empty: %s\n", seqs[i].name);
> 589a594
> >     }
> 893c898,902
> <       if (left) seqs[i].left+=left;
> ---
> >       if (left) {
> >         seqs[i].left+=left;
> >         /*  AGW added next line */
> >         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
> >       }
> 896c905,909
> <       if (right) seqs[i].right-=right;
> ---
> >       if (right) {
> >         seqs[i].right-=right;
> >         /*  AGW added next line */
> >         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
> >         }
> 898a912,913
> >         /* AGW added next line */
> >         fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name);
> 949a965,966
> >         /* AGW added next line */
> >           fprintf(stderr, "Vector: %s\n", seqs[i].name);
>
>
> Cheers,
>
> Andrew
>
>
> Phillip San Miguel wrote:
>
>> Hi Andrew,
>>
>> Thanks for the effort you went to here. Still looks there is a (more 
>> minor) problem though.
>> patch gives a few errors (see below) using your new diff. Looks like 
>> 2 of the 7 patches failed to patch lucy.c from lucy version lucy-1.19p.
>>
>> But the resulting source code does compile and run on the lucy test 
>> data. But the PolyA patches did not get inserted.
>>
>> Do you know if all 7 of your patches were installed into the lucy.c 
>> file from lucy-1.19p?
>>
>> (By the way, I think we are on the same page.  I do understand that 
>> your perl code parses lucy output. I've tried it on lucy 1.19p output 
>> and it succeeds--although it, of course, lacks some of the 
>> functionality that would be available from the patched version of lucy).
>>
>> Phillip
>>
>> Here is the output when I run patch:
>>
>> (lucy)% cd lucy-1.19p
>> (lucy-1.19p)% patch -b -i AndrewsNewPatch.diff lucy.c
>>  Looks like a normal diff.
>> Hunk #4 failed at line 893.
>> Hunk #5 failed at line 896.
>> 2 out of 7 hunks failed: saving rejects to lucy.c.rej
>>  I can't seem to find a patch in there anywhere.
>>
>> Here is the lucy.c.rej file contents:
>>
>> ***************
>> *** 893,893 ****
>> !       if (left) seqs[i].left+=left;
>> --- 898,902 ----
>> !       if (left) {
>> !         seqs[i].left+=left;
>> !         /*  AGW added next line */
>> !         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
>> !       }
>> ***************
>> *** 896,896 ****
>> !       if (right) seqs[i].right-=right;
>> --- 905,909 ----
>> !       if (right) {
>> !         seqs[i].right-=right;
>> !         /*  AGW added next line */
>> !         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
>> !         }
>>
>>
>> Andrew Walsh wrote:
>>
>>> Hi Phillip,
>>>
>>> The patch pasted at the bottom of this e-mail should do the trick.  
>>> When you say that lucy seg faults, I assume you mean that you get 
>>> the segfault when running lucy on its own.  The module itself does 
>>> not call lucy.  It is only parsing the output from the files that 
>>> lucy creates. lucy itself should be taking phred files as its 
>>> input.  The patch is required if you want to use the stderr from the 
>>> lucy to get more information from the module about the sequences.  
>>> If you apply this patch, you can try running the test that comes 
>>> with the lucy tarball (see the README.FIRST file in the 
>>> distribution).  It works for me (Suse 9.0 on a Pentium 3 box).  Let 
>>> me know if there are any problems.  I will update the Appendix for 
>>> Bio::Tools::Lucy in CVS.
>>>
>>> Cheers,
>>>
>>> Andrew
>>>
>>>
>>> 277a278,279
>>> >       /* AGW added next line */
>>> >       fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name);
>>> 588c590,592
>>> <     if ((seqs[i].len=bases)<=0)
>>> ---
>>> >     if ((seqs[i].len=bases)<=0) {
>>> >       /* AGW added next line */
>>> >       fprintf(stderr, "Empty: %s\n", seqs[i].name);
>>> 589a594
>>> >     }
>>> 893c898,902
>>> <       if (left) seqs[i].left+=left;
>>> ---
>>> >       if (left) {
>>> >         seqs[i].left+=left;
>>> >         /*  AGW added next line */
>>> >         fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name);
>>> >       }
>>> 896c905,909
>>> <       if (right) seqs[i].right-=right;
>>> ---
>>> >       if (right) {
>>> >         seqs[i].right-=right;
>>> >         /*  AGW added next line */
>>> >         fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name);
>>> >         }
>>> 898a912,913
>>> >         /* AGW added next line */
>>> >         fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name);
>>> 949a965,966
>>> >         /* AGW added next line */
>>> >           fprintf(stderr, "Vector: %s\n", seqs[i].name);
>>>
>>>
>>>
>>>
>>> Phillip SanMiguel wrote:
>>>
>>>> The patch to lucy source code from (the appendix):
>>>>
>>>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html
>>>>
>>>> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs 
>>>> fine, but the resulting executable (after make) seg faults when run 
>>>> on the lucy test data.
>>>>
>>>> Any advice?
>>>>
>>>> I've sent email directly to the module creator, Andrew G. Walsh, as 
>>>> requested in the module. But I'm not sure if the module creator 
>>>> regularly monitors the hotmail account listed therein. So I thought 
>>>> I'd post here, in case someone had a patch that would work on 
>>>> lucy-1.19.
>>>>
>>>
>>>
>>
>
>

From pmiguel at purdue.edu  Tue Aug  2 16:34:14 2005
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Tue Aug  2 16:26:58 2005
Subject: [Bioperl-l] ABI average singal intensity
In-Reply-To: <5f2c7e5efe93.5efe935f2c7e@uidaho.edu>
References: <5f2c7e5efe93.5efe935f2c7e@uidaho.edu>
Message-ID: <42EFD8C6.7060609@purdue.edu>

Hi Xiaojun,
Here is a perl one-liner that will give you the mean signal strengths 
from a .ab1 (or, probably a .abi) file:

perl -e 'undef $/; $trace=<>; ($sigptr)=$trace =~ m{S/N%.{16}(.{4})}s;\
($fwo)=$trace =~ /FWO_.{16}(.{4})/s;print "The signal strengths for the 
bases: "\
,$fwo," are: ",join(" 
",unpack("n*",substr($trace,unpack("N*",$sigptr),8))),"\n"' test.ab1

In the case of an ab1 file I have, I get the output:

The signal strengths for the bases: GATC are: 2710 4749 4034 3588

Copy and paste to the command line of the machine where you have the 
trace file--replace "test.ab1 with the actual name of your trace file of 
interest. In the unlikely case that your machine is a  VAX (or some 
other "little endian" machine) you will have to use "v*" and "V*" for 
unpacking...

I wrote the one-liner based of Clark Tibbett's paper about ABI file format:

http://www.cs.cmu.edu/afs/cs/project/genome/WWW/Papers/clark.html

A few words of caution: while the S/N% tag looks like it should give a 
"signal"/"noise" % , I'm not sure that it does exactly that. 
Nevertheless this is usually what is meant when one asks for the average 
"signal strength" of a chromat.

In addition, this is the "S/N%" of *processed* chromatographic data. 
There is quite a bit of normalization and background correction that 
goes on to produce the processed data from the raw data.

Phillip SanMiguel
Purdue Genomics Core Facility

Xiaojun Hu wrote:

>Hi,
>
>Does anyone know how to get the (A T C G)average
>singal intensity from ABI file?
>
>Thank you very much!
>
>Xiaojun Hu
>  
>

From horkko at gmail.com  Tue Aug  2 10:11:08 2005
From: horkko at gmail.com (Emmanuel QUEVILLON)
Date: Tue Aug  2 16:43:29 2005
Subject: [Bioperl-l] Bug in Bio::SeqFeature::Annotated ?
Message-ID: <5e8d03d50508020711e543b8b@mail.gmail.com>

Dears,

I tried to play with BioPerl to produce GFF3 output files. It works alright 
when I use Bio::SeqFeature::Generic and Bio::Tools::GFF but was more complex 
and longer when I tried to use Bio::SeqFeature::Annotated and 
Bio::FeatureIO.
Actually there are two problems with Bio::SeqFeature::Annotated

1) A bug in the '_initialize' method:


sub _initialize {
my ($self,@args) = @_;
my (
$start, $end, $strand, $frame, $phase, $score,
$name, $id, $annot, $location, <=== here $id shouldn't be here
$display_name, #deprecate
$seq_id, $type,$source
) =
$self->_rearrange([qw(START
END
STRAND
FRAME
PHASE
SCORE
NAME
ANNOTATION
LOCATION
DISPLAY_NAME
SEQ_ID
TYPE
SOURCE
)], @args);

defined $start && $self->start($start);
defined $end && $self->end($end);
defined $strand && $self->strand($strand);
defined $frame && $self->frame($frame);
defined $phase && $self->phase($phase);
defined $score && $self->score($score);
defined $source && $self->source($source);
defined $type && $self->type($type);
defined $location && $self->location($location);
defined $annot && $self->annotation($annot);

$id causes a shift in the values when they are rearranged. Then, for 
example, $id = (value of $annot) and $annot = (value of $location) and so 
on.

So it would be nice if it could be corrected (removed). This bug is still in 
the BioPerl live.

2) It is not possible to set a correct type when you create you 
Bio::SeqFeature::Annotated object. Actually it is correctly set when the 
object is created, but when you pass this object to 
Bio::FeatureIO::write_feature, suddenly the value is undefined and the gff3 
output contains the default value which is 'region'. I tried to debug this 
problem but I did not find a way to solve it. Maybe I miss some knowledges 
about Perl! ?

3) Also it could be nice it a test could be done on the presence or not of 
an annot object. If you follow the structure of the _initialize method 
below, you can see that start, end, frame, phase. source etc.. are set 
before the call of sub annotation. When these subroutines are called, a 
Bio::Annotation::Collection is created and set in memory. Then when 
annotation sub is called, this previous Collection object is overwriten with 
$annot. So the idea would be to install a test to throw or warn an error to 
the user for example when a Collection object is passed to the new method to 
avoid the overwriten.

that's all :). I hope these remarks will be usefull. If not, sorry to bother 
the list.

Regards

Emmanuel

-- 
Emmanuel Quevillon
email: horkko at gmail.com <http://gmail.com>
blog: http://horkko.blogspot.com

From Guido.Dieterich at gbf.de  Tue Aug  2 16:49:01 2005
From: Guido.Dieterich at gbf.de (Guido Dieterich)
Date: Tue Aug  2 16:43:30 2005
Subject: [Bioperl-l] parse genbank file
Message-ID: <1123015741.13213.94.camel@sb289.gbf-braunschweig.de>

Hi,

I want to parse a genbank file (Listeria Innocua)!

this is a part of the code ...
<code>

my $file = "NC_003212.gbk";

my $stream = Bio::SeqIO->new(-file => $file, -format => 'GenBank');

    while( my $seq = $stream->next_seq ) {

        print $seq->display_id;

}

</code>


output:

NC_003212

I just get the NC ID for this file, but not for the genes within ...


?????

Greetings

Guido
From walsh at cenix-bioscience.com  Wed Aug  3 03:13:47 2005
From: walsh at cenix-bioscience.com (Andrew Walsh)
Date: Wed Aug  3 03:04:29 2005
Subject: [Bioperl-l] parse genbank file
In-Reply-To: <1123015741.13213.94.camel@sb289.gbf-braunschweig.de>
References: <1123015741.13213.94.camel@sb289.gbf-braunschweig.de>
Message-ID: <42F06EAB.6010503@cenix-bioscience.com>

Hello,

There is only 1 'sequence' in the file (namely, NC_003212).  The genes 
are actually features on the sequence.  So, you would have to get the 
'gene' sequence features for the sequence.

e.g.

my $gene_seq_feats = get_list_seq_feats_by_primary_tag($seq_obj, 'gene');

sub get_list_seq_feats_by_primary_tag {
     my ($seq_obj, $tag) = @_;
     ref $seq_obj or
         confess "Seq obj not defined!";
     my @features = $seq_obj->top_SeqFeatures();
     my @list = ();
     for my $feat (@features) {
         if ($feat->primary_tag eq $tag) {
             push @list, $feat;
         }
     }
     return \@list
}

HTH,

Andrew


Guido Dieterich wrote:
> Hi,
> 
> I want to parse a genbank file (Listeria Innocua)!
> 
> this is a part of the code ...
> <code>
> 
> my $file = "NC_003212.gbk";
> 
> my $stream = Bio::SeqIO->new(-file => $file, -format => 'GenBank');
> 
>     while( my $seq = $stream->next_seq ) {
> 
>         print $seq->display_id;
> 
> }
> 
> </code>
> 
> 
> output:
> 
> NC_003212
> 
> I just get the NC ID for this file, but not for the genes within ...
> 
> 
> ?????
> 
> Greetings
> 
> Guido
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Tatzberg 47
01307 Dresden
Germany
Tel. +49-351-4173 137
Fax  +49-351-4173 109

public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------

From letondal at pasteur.fr  Wed Aug  3 09:00:30 2005
From: letondal at pasteur.fr (Catherine Letondal)
Date: Wed Aug  3 08:50:27 2005
Subject: [Bioperl-l] Bio::Tools::Run::PiseApplication : parameters changes
	in seqgen
Message-ID: <fa5e82488fb83c117909dedc840821df@pasteur.fr>

Hi,

The Pise/bioperl interface of the seq-gen program 
(http://bioweb.pasteur.fr/seqanal/interfaces/seqgen-simple.html) has 
changed. We have added new parameters, modified one and changed some 
parameters' type from Integer to Float:

Changes in parameters:
         - added options invar_site (-i), random_seed (-z), write-ancest 
(-wa),   write-sites (-wr), partition_numb (-p)
         - fixed bug in Phylip option: it was -p now it's a vlist: -op, 
-or, -on
         - changed type of some options type from integer to float 
(scale_branch, scale_tree, rate123, shape, freqACGT, transratio)

Please tell us if there is any trouble.

Best,

--
Catherine Letondal -- Institut Pasteur

From avilella at gmail.com  Wed Aug  3 11:00:04 2005
From: avilella at gmail.com (Albert Vilella)
Date: Wed Aug  3 10:50:44 2005
Subject: [Bioperl-l] Bio::Tools::Run prepare executions [was Re:bioperl-run
	Codeml.pm fix_blength]
In-Reply-To: <1121184178.8167.28.camel@localhost.localdomain>
References: <1121181586.8167.13.camel@localhost.localdomain>
	<FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>
	<1121182841.8167.22.camel@localhost.localdomain>
	<A15376B2-779D-4F25-8153-6B3417A18CCD@duke.edu>
	<1121184178.8167.28.camel@localhost.localdomain>
Message-ID: <1123081204.10112.2.camel@localhost.localdomain>

Hi all,

Having thought about the previous thread on changing tempdir as a
settable value in Bio::Tools::Run::WrapperBase (Jason? should we?)...

...I wonder if it may be interesting (at least it would for me) to
have something like a "prepare" method for the execution wrappers in
Bio::Tools::Run.

What I'm looking for is a way to create the dirs corresponding to the
analysis one wants to conduct. The "prepare" method would create, but
not execute, the dir with the ready-to-run elements of the executables
according to the various input data files and parameters.

Right now, we have a "run" method that first prepares the elements
needed for the execution and then runs the program.

We also have container objects for program results in bioperl-live.

This "prepare" method might be useful for people wanting to generate
sets of analysis for further execution on queueing-based systems or
similar scheduled execution situations.

I agree that the sole "preparation" of an execution it might not fit
well with the idea of an execution wrapper as it is now in bioperl, so
any suggestions/comments/criticism are welcome.

Bests,

    Albert.


El dt 12 de 07 del 2005 a les 18:03 +0200, en/na Albert Vilella va
escriure: 
> El dt 12 de 07 del 2005 a les 11:47 -0400, en/na Jason Stajich va
> escriure:
> > Sounds good - would you just copy the dir to the users specified
> > outdir?
> 
> yes
> 
> >    Another way to go is make tempdir a settable value (see
> > Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but
> > this may not be as clear on how to use it?
> 
> well, it is not as direct as the other way but maybe it is cleaner in
> the sense that will directly run the analysis on $tempdir and no extra
> cp or mv would be needed...
> 
>    Albert.
> 
> > 
> > 
> > -jason
> > On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote:
> > 
> > > El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va
> > > escriure:
> > > 
> > > > sure - fix away.
> > > > 
> > > 
> > > 
> > > done.
> > > 
> > > 
> > > Also, in my pipeline it would be interesting to call Codeml.pm via
> > > bioperl keeping the tempfiles in a specified directory:
> > > 
> > > 
> > > I understand that save_tempfiles will save the generated tempfiles
> > > in
> > > the temp directory, the dir will remain in $tempdir.
> > > An $outdir could be specified so that the codeml run is saved where
> > > the
> > > user specifies.
> > > 
> > > 
> > > What do you think?
> > > 
> > > 
> > >     Albert.
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > --
> > 
> > Jason Stajich
> > 
> > jason.stajich at duke.edu
> > 
> > http://www.duke.edu/~jes12/
> > 
> > 
> > 
> > 
> > 

From anunberg at oriongenomics.com  Wed Aug  3 11:40:21 2005
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Wed Aug  3 11:30:32 2005
Subject: [Bioperl-l] Error making query to Bio::DB:GFF db
Message-ID: <BF164F95.6ED9%anunberg@oriongenomics.com>

I have a Bio::DB:GFF db of the human genome.

When querying a particular chromosome I consistenly get the following error
when attempting to create a segment

------------- EXCEPTION  -------------
MSG: Couldn't execute query SELECT fref,
       IF(ISNULL(gclass),'Sequence',gclass),
       min(fstart),
       max(fstop),
       fstrand,
       gname
  FROM fdata,fgroup
  WHERE fgroup.gname=?
    AND fgroup.gclass=?
    AND fgroup.gid=fdata.gid
    GROUP BY fref,fstrand,gname
:
 MySQL server has gone away


What does this mean the server has gone away??

-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com


From gdw1 at cornell.edu  Wed Aug  3 12:00:13 2005
From: gdw1 at cornell.edu (Gregory Drake Wilson)
Date: Wed Aug  3 11:50:35 2005
Subject: [Bioperl-l] bl2seq and next_aln()
Message-ID: <1343.129.44.235.147.1123084813.squirrel@129.44.235.147>

I am trying to parse a bl2seq file but am only being returned one of the
alignments when there are 2+.
Code:
        my @params = (program  => 'blastn' , 'outfile' => 'bl2seq.out');
        my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
        my $report = $factory->bl2seq($seq1, $seq2);

        my $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' =>
'bl2seq');

        while ( my $aln = $str->next_aln() ) {
           print $aln->consensus_iupac()."\n";
        }

Opening 'bl2seq.out' shows mutiple alignments, yet this code only returns
the first one in the file. Any thoughts?

Greg

From jason.stajich at duke.edu  Wed Aug  3 13:43:33 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug  3 13:33:42 2005
Subject: [Bioperl-l] bl2seq and next_aln()
In-Reply-To: <1343.129.44.235.147.1123084813.squirrel@129.44.235.147>
References: <1343.129.44.235.147.1123084813.squirrel@129.44.235.147>
Message-ID: <1123091013.42f10245c0e8a@webmail.duke.edu>

Not sure - could be bug in AlignIO::bl2seq -- although it just uses SearchIO...
But could also be silly file sync problem in that filehandle is not closed
(although this is also unlikely as the output it written to file by bl2seq).  So
not sure - does it only show the 1st alnment?

Personally I would use the report object to get the aln directly.

if( my $r = $report->next_result ) {
 while( my $hit = $r->next_hit ) {
  while( my $hsp = $hit->next_hsp ) {
    print $hsp->get_aln->consensus_iupac()."\n";
  }
 }
}
-- 
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


Quoting Gregory Drake Wilson <gdw1@cornell.edu>:

> I am trying to parse a bl2seq file but am only being returned one of the
> alignments when there are 2+.
> Code:
>         my @params = (program  => 'blastn' , 'outfile' => 'bl2seq.out');
>         my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>         my $report = $factory->bl2seq($seq1, $seq2);
> 
>         my $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' =>
> 'bl2seq');
> 
>         while ( my $aln = $str->next_aln() ) {
>            print $aln->consensus_iupac()."\n";
>         }
> 
> Opening 'bl2seq.out' shows mutiple alignments, yet this code only returns
> the first one in the file. Any thoughts?
> 
> Greg
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From radev at umich.edu  Wed Aug  3 17:58:16 2005
From: radev at umich.edu (radev@umich.edu)
Date: Wed Aug  3 17:48:21 2005
Subject: [Bioperl-l] newbie Q - missing Tools/HMM.pm ?
In-Reply-To: <mailman.614.1123105571.699.bioperl-l@portal.open-bio.org> from
	"bioperl-l-bounces@portal.open-bio.org" at Aug 03,
	2005 05:46:11 PM
Message-ID: <20050803215816.64727B848B@tangra.si.umich.edu>

Hi,

I just installed Bundle::BioPerl via CPAN. I am now trying to run the code
in http://doc.bioperl.org/bioperl-live/Bio/Tools/HMM.html

but for some reason Tools/HMM.pm didn't get installed with the rest of
the code. Neither did SeqIO.pm .

What did I miss?

Thanks!

Drago
From sdavis2 at mail.nih.gov  Wed Aug  3 18:19:28 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Aug  3 18:10:14 2005
Subject: [Bioperl-l] newbie Q - missing Tools/HMM.pm ?
In-Reply-To: <20050803215816.64727B848B@tangra.si.umich.edu>
Message-ID: <BF16BB30.B107%sdavis2@mail.nih.gov>

On 8/3/05 5:58 PM, "radev@umich.edu" <radev@umich.edu> wrote:

> Hi,
> 
> I just installed Bundle::BioPerl via CPAN. I am now trying to run the code
> in http://doc.bioperl.org/bioperl-live/Bio/Tools/HMM.html
> 
> but for some reason Tools/HMM.pm didn't get installed with the rest of
> the code. Neither did SeqIO.pm .
> 
> What did I miss?

Bundle::Bioperl only installs needed CPAN modules for bioperl.  It doesn't
install bioperl at all.  You will now need to install bioperl.

Sean

From jason.stajich at duke.edu  Wed Aug  3 20:52:48 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug  3 20:43:26 2005
Subject: [Bioperl-l] newbie Q - missing Tools/HMM.pm ?
In-Reply-To: <20050803215816.64727B848B@tangra.si.umich.edu>
References: <20050803215816.64727B848B@tangra.si.umich.edu>
Message-ID: <786C6E50-63BA-4EA3-A20E-EF3F36BF2A87@duke.edu>


Besides Sean's point that the Bundle doesn't install Bioperl itsself,
this module is only in bioperl-live CVS and not in the 1.4 release  
that is on CPAN.  See the bioperl website for how to get the CVS code.
You can also browse daily(at least) CVS checkouts here
http://bioperl.org/SRC/bioperl-live

-jason
On Aug 3, 2005, at 5:58 PM, radev@umich.edu wrote:

> Hi,
>
> I just installed Bundle::BioPerl via CPAN. I am now trying to run  
> the code
> in http://doc.bioperl.org/bioperl-live/Bio/Tools/HMM.html
>
> but for some reason Tools/HMM.pm didn't get installed with the rest of
> the code. Neither did SeqIO.pm .
>
> What did I miss?
>
> Thanks!
>
> Drago
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From allenday at ucla.edu  Wed Aug  3 21:08:39 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Aug  3 20:58:43 2005
Subject: [Bioperl-l] darwin PERL5LIB ignored
Message-ID: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>

This is an off-topic question for the list, but I know there are lot of 
mac users here, and I'm hoping for a quick fix.

I'm having problems getting my bash environment to recognize my $PERL5LIB 
variable.  Even if I declare the variable in my .bashrc file and source 
it, the variable is ignored until I explicitly export it from my session 
prompt.  Below is a dialog illustrating the problem.

Anyone know of a workaround to get perl to use $PERL5LIB as declared in
the .bashrc file, as opposed to requiring an explicit export of the
variable?

Thanks.
-Allen

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#variable appears to be set up correctly
buildmac:~ allenday$ echo $PERL5LIB
/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl

#but it doesn't appear in @INC
buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl
/Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6
/Library/Perl/5.8.1
.

#the variable is defined in my .bashrc file, which is evaluated at login
buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB
PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl

#just to make sure, sourcing the .bashrc file has no effect on @INC
buildmac:~ allenday$ source ~/.bashrc
buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl
/Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6
/Library/Perl/5.8.1
.

#I have to explicitly export from the prompt to affect @INC
buildmac:~ allenday$ export PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
/net/groove/lib/perl5/site_perl
/usr/local/lib/perl5/site_perl
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl
/Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6
/Library/Perl/5.8.1
.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From taerwin at tpg.com.au  Wed Aug  3 21:37:28 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Wed Aug  3 21:32:33 2005
Subject: [Bioperl-l] darwin PERL5LIB ignored
In-Reply-To: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>
Message-ID: <1123119449.11338.3.camel@bacp4>

> #the variable is defined in my .bashrc file, which is evaluated at login
> buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB
> PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl

You should export the variable from your .bashrc

export PERL5LIB=/net/groove/lib/perl5/site_perl:/other_dirs

If you don't export it it will only get set for the current shell and
wont be transfer to other shells.

Regards,

Tim

On Wed, 2005-08-03 at 18:08 -0700, Allen Day wrote:
> This is an off-topic question for the list, but I know there are lot of 
> mac users here, and I'm hoping for a quick fix.
> 
> I'm having problems getting my bash environment to recognize my $PERL5LIB 
> variable.  Even if I declare the variable in my .bashrc file and source 
> it, the variable is ignored until I explicitly export it from my session 
> prompt.  Below is a dialog illustrating the problem.
> 
> Anyone know of a workaround to get perl to use $PERL5LIB as declared in
> the .bashrc file, as opposed to requiring an explicit export of the
> variable?
> 
> Thanks.
> -Allen
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> #variable appears to be set up correctly
> buildmac:~ allenday$ echo $PERL5LIB
> /net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> 
> #but it doesn't appear in @INC
> buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level
> /Library/Perl/5.8.6
> /Library/Perl
> /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6
> /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6
> /Library/Perl/5.8.1
> .
> 
> #the variable is defined in my .bashrc file, which is evaluated at login
> buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB
> PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> 
> #just to make sure, sourcing the .bashrc file has no effect on @INC
> buildmac:~ allenday$ source ~/.bashrc
> buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level
> /Library/Perl/5.8.6
> /Library/Perl
> /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6
> /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6
> /Library/Perl/5.8.1
> .
> 
> #I have to explicitly export from the prompt to affect @INC
> buildmac:~ allenday$ export PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
> /net/groove/lib/perl5/site_perl
> /usr/local/lib/perl5/site_perl
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level
> /Library/Perl/5.8.6
> /Library/Perl
> /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6
> /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6
> /Library/Perl/5.8.1
> .
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

From allenday at ucla.edu  Wed Aug  3 23:00:19 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Aug  3 22:50:33 2005
Subject: bioperl rpms via yum on Darwin (Was: [Bioperl-l] darwin PERL5LIB
	ignored0
In-Reply-To: <1123119449.11338.3.camel@bacp4>
References: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>
	<1123119449.11338.3.camel@bacp4>
Message-ID: <Pine.LNX.4.58.0508031956590.17629@sumo.ctrl.ucla.edu>

yep, that did it.  i don't know why i didn't prefix export onto that line
as will all the others in my .bashrc file.

thanks a bunch.  btw, the reason i'm doing this is b/c i'm in the middle 
of porting the biopackages rpm repository to be installable on darwin.

i just finished porting rpm and yum yesterday -- should be able to have 
bioperl installable via rpm within a week, barring no c library dependency 
problems.

-allen

On Thu, 4 Aug 2005, Tim Erwin wrote:

> > #the variable is defined in my .bashrc file, which is evaluated at login
> > buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB
> > PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> 
> You should export the variable from your .bashrc
> 
> export PERL5LIB=/net/groove/lib/perl5/site_perl:/other_dirs
> 
> If you don't export it it will only get set for the current shell and
> wont be transfer to other shells.
> 
> Regards,
> 
> Tim
> 
> On Wed, 2005-08-03 at 18:08 -0700, Allen Day wrote:
> > This is an off-topic question for the list, but I know there are lot of 
> > mac users here, and I'm hoping for a quick fix.
> > 
> > I'm having problems getting my bash environment to recognize my $PERL5LIB 
> > variable.  Even if I declare the variable in my .bashrc file and source 
> > it, the variable is ignored until I explicitly export it from my session 
> > prompt.  Below is a dialog illustrating the problem.
> > 
> > Anyone know of a workaround to get perl to use $PERL5LIB as declared in
> > the .bashrc file, as opposed to requiring an explicit export of the
> > variable?
> > 
> > Thanks.
> > -Allen
> > 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > #variable appears to be set up correctly
> > buildmac:~ allenday$ echo $PERL5LIB
> > /net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> > 
> > #but it doesn't appear in @INC
> > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
> > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> > /System/Library/Perl/5.8.6
> > /Library/Perl/5.8.6/darwin-thread-multi-2level
> > /Library/Perl/5.8.6
> > /Library/Perl
> > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> > /Network/Library/Perl/5.8.6
> > /Network/Library/Perl
> > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> > /System/Library/Perl/Extras/5.8.6
> > /Library/Perl/5.8.1
> > .
> > 
> > #the variable is defined in my .bashrc file, which is evaluated at login
> > buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB
> > PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> > 
> > #just to make sure, sourcing the .bashrc file has no effect on @INC
> > buildmac:~ allenday$ source ~/.bashrc
> > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
> > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> > /System/Library/Perl/5.8.6
> > /Library/Perl/5.8.6/darwin-thread-multi-2level
> > /Library/Perl/5.8.6
> > /Library/Perl
> > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> > /Network/Library/Perl/5.8.6
> > /Network/Library/Perl
> > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> > /System/Library/Perl/Extras/5.8.6
> > /Library/Perl/5.8.1
> > .
> > 
> > #I have to explicitly export from the prompt to affect @INC
> > buildmac:~ allenday$ export PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl
> > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"'
> > /net/groove/lib/perl5/site_perl
> > /usr/local/lib/perl5/site_perl
> > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> > /System/Library/Perl/5.8.6
> > /Library/Perl/5.8.6/darwin-thread-multi-2level
> > /Library/Perl/5.8.6
> > /Library/Perl
> > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> > /Network/Library/Perl/5.8.6
> > /Network/Library/Perl
> > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> > /System/Library/Perl/Extras/5.8.6
> > /Library/Perl/5.8.1
> > .
> > 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
From ureddi at emich.edu  Thu Aug  4 09:42:19 2005
From: ureddi at emich.edu (Usha Rani Reddi)
Date: Thu Aug  4 09:34:10 2005
Subject: [Bioperl-l] bl2seq
Message-ID: <655276655c36.655c36655276@emich.edu>

Hi,
I tried to run local bl2seq by installing Bioperl on Linux machine. 
When I tried to align 2 sequences using bl2seq I got an error message 
that says "could not find path to bl2seq". After getting the error 
message  I did set the environmental variables(path) and tried again I 
got the same error message. Please help me with this.
Thanks
Usha

From jason.stajich at duke.edu  Thu Aug  4 14:20:13 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Aug  4 14:11:15 2005
Subject: [Bioperl-l] Re: Bio::Tools::Run prepare executions [was
	Re:bioperl-run Codeml.pm fix_blength]
In-Reply-To: <1123081204.10112.2.camel@localhost.localdomain>
References: <1121181586.8167.13.camel@localhost.localdomain>
	<FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>
	<1121182841.8167.22.camel@localhost.localdomain>
	<A15376B2-779D-4F25-8153-6B3417A18CCD@duke.edu>
	<1121184178.8167.28.camel@localhost.localdomain>
	<1123081204.10112.2.camel@localhost.localdomain>
Message-ID: <C2542A87-0EEA-404A-A1B2-D2714D1CCB38@duke.edu>


On Aug 3, 2005, at 11:00 AM, Albert Vilella wrote:

> Hi all,
>
> Having thought about the previous thread on changing tempdir as a
> settable value in Bio::Tools::Run::WrapperBase (Jason? should we?)...
>
i think it will be fine to do those changes if I remember correctly  
what they were... =)
> ...I wonder if it may be interesting (at least it would for me) to
> have something like a "prepare" method for the execution wrappers in
> Bio::Tools::Run.
>
> What I'm looking for is a way to create the dirs corresponding to the
> analysis one wants to conduct. The "prepare" method would create, but
> not execute, the dir with the ready-to-run elements of the executables
> according to the various input data files and parameters.
>
> Right now, we have a "run" method that first prepares the elements
> needed for the execution and then runs the program.
>
> We also have container objects for program results in bioperl-live.
>
> This "prepare" method might be useful for people wanting to generate
> sets of analysis for further execution on queueing-based systems or
> similar scheduled execution situations.
>

Sure - this sounds fine- I guess part of the prepare step, though is  
preparing the arguments to send to the programs.
Do you want to capture these arguments as well?
My understanding is the BioPipe system (which may not have many devs  
now) tried make this possible by encoding the input options to the  
Perl modules in an XML file which was loaded into the pipeline db.
http://www.genome.org/cgi/content/abstract/13/8/1904

But I'm definitely open to some other ideas about how this should be  
done and the idea of a prepare step seems great (especially if we  
break out a cleanup step as well and insure that every run cmd does a  
prepare, execute, cleanup cycle.

Thanks for jumping in on this - I think your ideas and intuition here  
are right on the mark and I think a more systematic approach on the  
parts needed to run an external program should be spelled out in the  
code.

-jason
> I agree that the sole "preparation" of an execution it might not fit
> well with the idea of an execution wrapper as it is now in bioperl, so
> any suggestions/comments/criticism are welcome.
>
> Bests,
>
>     Albert.
>
>
>
> El dt 12 de 07 del 2005 a les 18:03 +0200, en/na Albert Vilella va
> escriure:
>
>> El dt 12 de 07 del 2005 a les 11:47 -0400, en/na Jason Stajich va
>> escriure:
>>
>>> Sounds good - would you just copy the dir to the users specified
>>> outdir?
>>>
>>
>> yes
>>
>>
>>>    Another way to go is make tempdir a settable value (see
>>> Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but
>>> this may not be as clear on how to use it?
>>>
>>
>> well, it is not as direct as the other way but maybe it is cleaner in
>> the sense that will directly run the analysis on $tempdir and no  
>> extra
>> cp or mv would be needed...
>>
>>    Albert.
>>
>>
>>>
>>>
>>> -jason
>>> On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote:
>>>
>>>
>>>> El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va
>>>> escriure:
>>>>
>>>>
>>>>> sure - fix away.
>>>>>
>>>>>
>>>>
>>>>
>>>> done.
>>>>
>>>>
>>>> Also, in my pipeline it would be interesting to call Codeml.pm via
>>>> bioperl keeping the tempfiles in a specified directory:
>>>>
>>>>
>>>> I understand that save_tempfiles will save the generated tempfiles
>>>> in
>>>> the temp directory, the dir will remain in $tempdir.
>>>> An $outdir could be specified so that the codeml run is saved where
>>>> the
>>>> user specifies.
>>>>
>>>>
>>>> What do you think?
>>>>
>>>>
>>>>     Albert.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jason Stajich
>>>
>>> jason.stajich at duke.edu
>>>
>>> http://www.duke.edu/~jes12/
>>>
>>>
>>>
>>>
>>>
>>>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From ushashankar2000 at yahoo.com  Thu Aug  4 09:35:29 2005
From: ushashankar2000 at yahoo.com (Usha)
Date: Thu Aug  4 15:40:22 2005
Subject: [Bioperl-l] bl2seq
Message-ID: <20050804133529.47171.qmail@web34314.mail.mud.yahoo.com>

Hi,
I tried to run local bl2seq by installing Bioperl on Linux machine. When I tried to align 2 sequences using bl2seq I got an error message that says "could not find path to bl2seq". After getting the error message  I did set the eenvironmental variables(path) and tried again I got the same error message. Please help me with this.
Thanks
Usha

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From mconte at cirad.fr  Thu Aug  4 11:29:20 2005
From: mconte at cirad.fr (matthieu)
Date: Thu Aug  4 15:40:23 2005
Subject: [Bioperl-l] dividing seqboot outfiles
Message-ID: <42F23450.70209@cirad.fr>

Hello,
I'm trying to divide seqboot outfiles containing 100 multialignments in 
, for example, 10 files of 10 multialignments. I did'nt find any parser 
for this.
I'm thinking about identifying the first charaters of the seqboot 
outfiles (ex :" 3   639 " in my example) to recognize each 
multialignment "blocks" but I didn't manage to do this...
In join my frist code and an example of seqboot outfile.
Thanks


Matthieu


From mconte at cirad.fr  Thu Aug  4 11:31:23 2005
From: mconte at cirad.fr (matthieu)
Date: Thu Aug  4 15:40:25 2005
Subject: [Bioperl-l] dividing seqboot outfile
Message-ID: <42F234CB.9030609@cirad.fr>

Oups...my script and my exmaple file
-------------- next part --------------
#!/usr/bin/perl

#### Divide one file who contain  100 multialignments into 10 files of 10 multialignments


use Bio::AlignIO;
 

my $file = shift;
my $out = shift;


my $descripteur = open($file);

my $switch=-1;
while ($switch <10)
{
my $alignment = extract($descripteur);	
print $alignment ;
$switch ++;	

 
 } 

sub open {
    my($my_file) = @_;
    my $descripteur;
    unless(open($descripteur,$my_file)) {
	print "Can't open  $my_file !\n";
	exit;
    }
    return $descripteur;
}

sub extract {

    	my($descripteur_fichier) = @_;

	my($enregistrement) = '';
	
	my($separateur) = $/;

# recognize the motif at the beginning of each alignment	
	$/ = "    3   639\n";
	 print "$/ \n";
	$enregistrement = <$descripteur_fichier>;

 	$/ = $separateur;
	
  #print "$separateur !\n";
    return $enregistrement;
   
   
}
-------------- next part --------------
    3   639
01g45860.1   ---------- ---------- ---------- ----MMMDFF FF------WW WPDPAAASSS
t5g66770.1   ACCTTDDDDS GNNAQQQQQI KQQQQQQQEQ QHHHHHHQFF IILSLNNPWW WPNTSSLGFF
t5g66770.2   ACCTTDDDDS GNNAQQQQQI KQQQQQQQEQ QHHHHHHQFF IILSLNNPWW WPNTSSLGFF

             GLLLDAAAGG FLPPPPPPAV ---------- ---------- --AAPPDDDV GG--------
             GLLLSGGGSS AFDDDPPPQV TGGDDSSSDP GPFPPNNLDH HHAATTTTTG GGRLLDDGGG
             GLLLSGGGSS AFPPPPPPQV TGGDDSSSDP GPFPPNNLDH HHAATTTTTG GGRLLDDGGG

             ---------- ---------- ---------- ---YPPPAA- --DD------ ----------
             GGGGGFEEEE SDEMEELLIS GDVAAADDGC DTTHNPPDDV VIDDPPPDDT PSSVPLLLLR
             GGGGGFEEEE SDEMEELLIS GDVAAADDGC DTTHNPPDDV VIDDPPPDDT PSSVPLLLLR

             VDAAALAAAA AAFPPPCCCA PPPAAAALL- AAMMRRREAG GIRR------ ----LHLLLS
             IDTSSPPTTL LLWPPPSSSS PPPSSPPTTH SSPPTKKEND DSEEDDDFFF FLEELKAAID
             IDTSSPPTTL LLWPPPSSSS PPPSSPPTTH SSPPTKKEND DSEEDDDFFF FLEELKAAID

             SAGGEAHHLA ADDSAALASS AAASIGVVAH HHHFTTTSP- SSPPPAPTTD AAEEHHALYY
             DA--SDPPEL LQQISSVEGG DPPT-EVVAY YYYFEEESPN SSPPPTSSSS SSTTEEDIYY
             DA--SDPPEL LQQISSVEGG DPPT-EVVAY YYYFEEESPN SSPPPTSSSS SSTTEEDIYY

             HHYEEEAAAA YYLLKKFTQQ ILLLLFFHCC CDHHIDDFSL QLQQWPPPAL LIALALPPGG
             KKNDDDAAAA YYSSKKFTQQ ILLLLTTESS SNHHVDDFGI QIQQWPPPAL LLALATTTSG
             KKNDDDAAAA YYSSKKFTQQ ILLLLTTESS SNHHVDDFGI QIQQWPPPAL LLALATTTSG

             GPP-RIIITT PPTG-----L DDVLADLAAR RRRFFSADDE VPWWMLLIIA PGEEAAFFNS
             GKPQRVVVSS PLGEPPSSSL AATNRDFAAK KKDFFDTHHL LGSSSFFVVD PDEEVAVVNF
             GKPQRVVVSS PLGEPPSSSL AATNRDFAAK KKDFFDTHHL LGSSSFFVVD PDEEVAVVNF

             SSVLLLHRLL LLGGPPAAAA DQAPP----- IIALCCAASS VRKKIFIIED NNTTTGFDDR
             FFMLLLYKLL LL-------- DETPPTTTII VVTLLLKKSS LNRRVVGGES NNVVVGFNNR
             FFMLLLYKLL LL-------- DETPPTTTII VVTLLLKKSS LNRRVVGGES NNVVVGFNNR

             FTEAFYSSAA VFLDDDASSA SSSSGGGAGN AAEAYLLLIC DDVVGGGEA- -RRERPPPRR
             VKNAQFSSAA VFLEEENLLG RRRRDDSEER VVEELFFFIS GGIIPPPETI IHHEREEEQQ
             VKNAQFSSAA VFLEEENLLG RRRRDDSEER VVEELFFFIS GGIIPPPETI IHHEREEEQQ

             WRDRRRGGLA PPLGNAALRR QARMVVGLLL FFGEGG-HHS EEAADDDDDD GCTHFSSSAW
             WRVLLNGGFS KKLSYAAVSS QAKILLWNNN YYYSNNYSSI ESKKPPPPPP GFSNLTTSSW
             WRVLLNGGFS KKLSYAAVSS QAKILLWNNN YYYSNNYSSI ESKKPPPPPP GFSNLTTSSW

             WWGDGGGNNN NNSGGSSNNS SGSSSSSGGG GGDSSSVCL
             WW-------- ---------- ---------- ---------
             WW-------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- -----MTTPP ------WWWP MMDPALLDDD AAGFFFFPPA
t5g66770.1   AAYMCTGGGG NLLIIQQKQQ EQHHDHIIGG LLNNPPWWWP --NTSLLSSS GGSAAAAPPQ
t5g66770.2   AAYMCTGGGG NLLIIKKKQQ EQHHDHIIGG LLNNPPWWWP --NTSLLSSS GGSAAAAPPQ

             ---------- ---------- -----ADDDD GV-------- ---------- ----------
             GGGGGSNPFP PPFFFFFPDH HHHHHATTTT GGRRLLDDFG GGGGGFEESD EWWEEEELLL
             GGGGGSNPFP PPFFFFFPDH HHHHHATTTT GGRRLLDDFG GGGGGFEESD EWWEEEELLL

             ---------- --YYYYPPP- --GADDD--- ---------- ---------D LPFFFFFFPP
             IGGVVADGPT WWHHHHNPPY VVGPDDDPFD TYPSSRLSVQ SSDNNRRRVD PLPPPPWWPP
             IGGVVADGPT WWHHHHNPPY VVGPDDDPFD TYPSSRLSVQ SSDNNRRRVD PLPPPPWWPP

             PPAAAAAAAA AVVVLL-ARE EEEAAAGIR- ---HHLLLMC AAGGAIIIEA GDASAAQLDD
             PPSSSIIPPP PLLLTTHSTE EDPNNNDSED DDDKKIIIYC AA--RIIISD SDASKKTLQQ
             PPSSSIIPPP PLLLTTHSTE EDPNNNDSED DDDKKIIIYC AA--RIIISD SDASKKTLQQ

             DHALAAAGII IGGGRRAAHF TAALLFPP-V VVATTTTDAA AEEAAFLHHH HHYYCPPKAH
             QREVSEP--- -EEERRAAYF EAALLSPPNA AATSSSSSSS STTDDLIKKK TTNNCPPKAH
             QREVSEP--- -EEERRAAYF EAALLSPPNA AATSSSSSSS STTDDLIKKK TTNNCPPKAH

             FTTQIIIILE EEEAAFHHGG DDDHHVVIIF LLMMMMGLQQ PALIIQQALL LAARGPPPPP
             LTTQIIIILE EEEAATEEKK NNNHHIIVVF IIVVVVGIQQ PALLLQQALL LAARGKKPPP
             LTTQIIIILE EEEAATEEKK NNNHHIIVVF IIVVVVGIQQ PALLLQQALL LAARGKKPPP

             FF--TGIIGG GGPPPSGRRD DEE-DGGLLL SRVRSSGGVA AAASEVRPWM QQPPGEEVAA
             TTQQSGIIPP PPAPPSESSP PEEPAGGNLF VDLNDDPPIL LTTPLLNGSS RRPPDEELAA
             TTQQSGIIPP PPAPPSESSP PEEPAGGNLF VDLNDDPPIL LTTPLLNGSS RRPPDEELAA

             FNNSVLLQQQ RLLGPPDDAP PP---IILVV VASSSSRPKI FFVIQQEAAH NKTTTTGFFL
             VNNFMLLQQQ KLL---DDTP PPTTIVVLAA AKSSSSNPRV VVLGYYEVVL NRVVVVGFFA
             VNNFMLLQQQ KLL---DDTP PPTTIVVLAA AKSSSSNPRV VVLGYYEVVL NRVVVVGFFA

             LLDDRFFTTE AFFYYSSAAS GGGNME---A YQEIICDIIV VCAAARRREH HHELSRRRRR
             AANNRVVKKN AQQFFSSNGR DDERRERRRE LGRIISGLLI IGTTGHRREM MMEKEQQRLL
             AANNRVVKKN AQQFFSSNGR DDERRERRRE LGRIISGLLI IGTTGHRREM MMEKEQQRLL

             LRALLLLLSS AAVLLLGSSN LARRRMMLLV GLFGGHHEEE GCCTLGGGWW GRRPPPLAAW
             MNAFFFFFEE SSVLLLSNNY VAKKKIILLL WNYYYSSEEE GFFSLAAAWW DLLPPPLSSW
             MNAFFFFFEE SSVLLLSNNY VAKKKIILLL WNYYYSSEEE GFFSLAAAWW DLLPPPLSSW

             WEAGGGGGDN SNGSSSDNNN GSNNGGKKGG RRGSSSCCL
             WR-------- ---------- ---------- ---------
             WR-------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- MDDTTFFPF- -----WMPAA
t5g66770.1   MMYMCSSSSG GGNNNLAQVI IKQEEEEEQQ QQQQQHHHHD HQQIIFFGIL SNPPPW-TSL
t5g66770.2   MMYMCSSSSG GGNNNLAQVI IKQEEEEEQQ QQQQQHHHHD HQQIIFFGIL SNPPPW-TSL

             SSGGFLLPPA VV-------- ---------- APDDY----- ---------- ----------
             GFSSAFFPPQ VVTPPGGGGP PFFPPNDDHH ATTTFRRLLL SDFGGTTGED EETTGGDADC
             GFSSAFFPPQ VVTPPGGGGP PFFPPNDDHH ATTTFRRLLL SDFGGTTGED EETTGGDADC

             --YDPPPPAA A----AA--- ---------- ---------- ----VDAAPP EEFFAAAAFF
             WWHDPPPPDD DYYYIPPYSS RRRRSSSVVV QQQSDDLLNN RRVVIDTTLL PPPPPPTLWW
             WWHDPPPPDD DYYYIPPYSS RRRRSSSVVV QQQSDDLLNN RRVVIDTTLL PPPPPPTLWW

             FFCAPPDAAA AAAAAVLLLL RRREEEEEEG IRR------L LSCAAGGAAI IIEGHLLAAQ
             WWSSPPLSSS IIPPPLTTTT TTKEDPPPED SEEDDFDLPA IDCAA--RRI IISSPEEAKT
             WWSSPPLSSS IIPPPLTTTT TTKEDPPPED SEEDDFDLPA IDCAA--RRI IISSPEEAKT

             SSAAAAVASG GVVAVFTLRR RFPPSPPPPD DAHALL-HFE ECPFAFNNNN AIFFFHCCCH
             IIESSELPTE EVVAFFELNN RSPPSPPSSS SSEDIILKLD DCPFALNNNN AITTTESSSK
             IIESSELPTE EVVAFFELNN RSPPSPPSSS SSEDIILKLD DCPFALNNNN AITTTESSSK

             VVIDDDDFSL LQGLQQWPPP AAALLLIQQL LLLLRRGGGP PPF-LRRITP PPGDEEEE--
             IIVDDDDFGI IQGIQQWPPP AAALLLLQQL LTTTRRSSGK PPTQIRRVSA PPEPEEEEPP
             IIVDDDDFGI IQGIQQWPPP AAALLLLQQL LTTTRRSSGK PPTQIRRVSA PPEPEEEEPP

             DDDDDVGGLL LLLLAADAAS SRRRVVVFFF GGAAANNSLL DRMLLQQGGG GEVVAANLHH
             AAAAATGGNL LLLLRRDAAV VDDDLLLFFF PPLLL--PII HNSFFRRDDD DELLAANLYY
             AAAAATGGNL LLLLRRDAAV VDDDLLLFFF PPLLL--PII HNSFFRRDDD DELLAANLYY

             RLDQQA---I DDAVVVDASI TTVVIIIEQE EADNNNTGLL FFTTEFYYYS SSAAAVLLAA
             KLDEETTIIV DDTAAARKSV TTLLGGGEYE EVSNNNVGAA VVKKNQFFYS SSAAAVLLPN
             KLDEETTIIV DDTAAARKSV TTLLGGGEYE EVSNNNVGAA VVKKNQFFYS SSAAAVLLPN

             AAAASAAAAA AE-AYLLLQQ RREEEICDDI VCGGEAAARH HPPLRRRRDR AAAGGGGLSA
             NGGGREEVVV VERELFFFGG RRRRRISGGL IGPPETGGRM MEEKQQRRVL AAAGGGGFES
             NGGGREEVVV VERELFFFGG RRRRRISGGL IGPPETGGRM MEEKQQRRVL AAAGGGGFES

             PLGNAALLLR RMLVVVGSG- --HSSVEEEA ADDGGCTTLW WWWHHGRRPP LSGGDGGGGG
             KLSYAAVVVS KILLLLWNYL YYSIIVEESK KPPGGFSSLW WWWNNDLLPP LT--------
             KLSYAAVVVS KILLLLWNYL YYSIIVEESK KPPGGFSSLW WWWNNDLLPP LT--------

             GGGNNSSVSS GGGSDSNSSS SSSNGGGKKK SADGGGSLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ---DDDDQ-- --WWPPMMMD
t5g66770.1   AMCCCTTSSG NLMMMAAQQQ VIIIIKQQKK QQQQEQQQQH QQQQQQQNPL LPWWPP---N
t5g66770.2   AMCCCTTSSG NLMMMAAQQQ VIIIIKKKKK QQQQEQQQQH QQQQQQQNPL LPWWPP---N

             PASGGLDDDD GLLPPPPP-- ---------- ---------- ---PPPPDGG VGYYY-----
             TSGGGLSSSS SFFDDDDPTT GGGGDDDNDP GFPFPPNNNN HHHTTTTTGG GGFFFRRSDG
             TSGGGLSSSS SFFPPPPPTT GGGGDDDNDP GFPFPPNNNN HHHTTTTTGG GGFFFRRSDG

             ---------- ---------- ---------- ----YYYDPP P----AAAD- ----------
             GTTGGGGEFE ESDMTTLLII GGGSVDGPPD DDDWHHHDNP PYYIYPPPDY PSVQPPPSDN
             GTTGGGGEFE ESDMTTLLII GGGSVDGPPD DDDWHHHDNP PYYIYPPPDY PSVQPPPSDN

             -----VDAAA LAAAFFPPCA APDDAAAAAA VVVLMRRREE VR------LL VVLLLMCGAI
             RRRRVIDSSS PPTTWWPPSS SPLLSIIPPP LLLTPTKKPE TEDFDLPPLL LLAIIYC-RI
             RRRRVIDSSS PPTTWWPPSS SPLLSIIPPP LLLTPTKKPE TEDFDLPPLL LLAIIYC-RI

             AAGHLAAASL LHAAALLAAS SSGIGGRVVA HFTSSRRLLF PAAPPPTDEH F------YFF
             DDSPEAAASL LRESSVVEET TT--EERVVA YFESSNRLLS PTTSSSSSTE LLLSSSSYLL
             DDSPEAAASL LRESSVVEET TT--EERVVA YFESSNRLLS PTTSSSSSTE LLLSSSSYLL

             YEACCLLLFF TTAQEFFFGD HVHVVIIFFS SMQLQPLLII QAALLRRRPG GGGPPPRRII
             NDACCSSSFF TTAQETTTKN KIHIIVVFFG GVQIQPLLLL QAATTRRRTG GGGKPPRRVV
             NDACCSSSFF TTAQETTTKN KIHIIVVFFG GVQIQPLLLL QAATTRRRTG GGGKPPRRVV

             TGIIISSPTT GGRLRVGLLD LAARSRVVFF SGAAAANSLL DEEEVPPMQQ AAAPEAANSV
             SGIIISSLGG EESLITGNLD FAAKVDLLFF DPLLTT-PII HLLLLGGSRR DDDPEAANFM
             SGIIISSLGG EESLITGNLD FAAKVDLLFF DPLLTT-PII HLLLLGGSRR DDDPEAANFM

             LLHLLGDDPD QAAIDALDVV SSPFVIQAAA DDDHNNKFLL DRRRRFALFY YSSAAAVFSL
             LLYLL----D ETTVDTLRAA SSPVLGYVVV SSSLNNRFAA NRRRRVALQY YSSAAAVFSL
             LLYLL----D ETTVDTLRAA SSPVLGYVVV SSSLNNRFAA NRRRRVALQY YSSAAAVFSL

             LLSGAANNAA AAAQQREIIC CDIGGAAARR REERHHHHEP LSRRRRLLLT LSAVVPLGSN
             LLLDEERRVV VEEGGRRIIS SGLPKTTGHR REERMMMMEE KERRLLMMME FESVVKLSNY
             LLLDEERRVV VEEGGRRIIS SGLPKTTGHR REERMMMMEE KERRLLMMME FESVVKLSNY

             NNNAALLRQA MLLLGGLLLG ---VEGGCLT TTTLGGWHGR FFSASAWEAA AAADDGGGGD
             YYYAAVVSQA ILLLWWNNNN LYYVEGGFIS SSSLAAWNDL LLTLSSWR-- ----------
             YYYAAVVSQA ILLLWWNNNN LYYVEGGFIS SSSLAAWNDL LLTLSSWR-- ----------

             DDDNNNNNNN NSNSSSNNVS SGGGSSGGSS SNNGSSSGV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --DQQ----- MMDDASSSLL LAAFFFLPPP
t5g66770.1   MYYCTTDSGM AIIAVVVKQQ QKKKQQQQHH HQQNNPPPLL --NNSGFFLL LGGAAAFPPP
t5g66770.2   MYYCTTDSGM AIIAVVVKKK KKKKQQQQHH HQQNNPPPLL --NNSGFFLL LGGAAAFPPP

             VV-------- -------AVG ---------- ---------- ---------- ----------
             VVTGGPPFPP NNNDDHHAGG RRLLSDGGGG GGGGGEEEEE ESSEEMEEET LIIIGDSSAA
             VVTGGPPFPP NNNDDHHAGG RRLLSDGGGG GGGGGEEEEE ESSEEMEEET LIIIGDSSAA

             ---------Y YYYDDDPPA- ----GGDDD- ---------- ------VVAL LPEEFAAAPP
             DDPPPCCDDH HHHDDDPPDY YIIYGGDDDP FTYPVVVQSD DLNRRVIISP PLPPPTTLPP
             DDPPPCCDDH HHHDDDPPDY YIIYGGDDDP FTYPVVVQSD DLNRRVIISP PLPPPTTLPP

             DAAVLLLLL- --MMRREEEE VAI------- VHHLLLMSGA GDDAAASSSA ALLLDSSHHL
             LIPLTTTTTH EEPPTKDPEE TNSDFLEEPP LKKAAIYD-R SDDNNASSSK KLLLQIIRRV
             LIPLTTTTTH EEPPTKDPEE TNSDFLEEPP LKKAAIYD-R SDDNNASSSK KLLLQIIRRV

             LAAAAVVSAA ASGRRVAVVV HHFFTTLLFP PPVVVVPPPT TTDDAEEHFF LL--YYYACP
             VSSSELLGDD PTERRVAFFF YYFFTELLSP PPAAAASSSS SSSSSTTELL IISSYYYACP
             VSSSELLGDD PTERRVAFFF YYFFTELLSP PPAAAASSSS SSSSSTTELL IISSYYYACP

             PYLLLLFHNN IILEEEAFHH HHCHVISQQQ LQQAIQAALP PPPGGPPFLL LLLLIGIPPP
             PYSSSSFHNN IILEEEATEE EESKIVGQQQ IQQALQAATT TTTSGPPTII IIIIVGIAAP
             PYSSSSFHNN IILEEEATEE EESKIVGQQQ IQQALQAATT TTTSGPPTII IIIIVGIAAP

             PPPTGRRDEE ---RLLSRVV RFFSFFFFFG GAASLLEVPL LAAGEAVANN SSVQQLHLDD
             PLLGESSPEE PSSILLVDLL NFFDFFFFFP PLTPIILLGF FDDDEVLANN FFMQQLYL--
             PLLGESSPEE PSSILLVDLL NFFDFFFFFP PLTPIILLGF FDDDEVLANN FFMQQLYL--

             AAQPPPPP-- -IDDVVLLLC CVARKKKFFF TVIEEQEAAA HNNGGGGFLL LLLDRRRRFE
             --EPPPPPTI IVDDAALLLL LAKNRRRVVV TLGEEYEVVV LNNGGGGFAA AAANRRRRVN
             --EPPPPPTI IVDDAALLLL LAKNRRRVVV TLGEEYEVVV LNNGGGGFAA AAANRRRRVN

             ALFYDDSSLL DASSSAGAEE EYQRCCDIIE EGAAREEERR HHEPPPSRRW WRDRRRRRTT
             ALQFEESSLL ENRRREEVEE ELGRSSGLLE EKTGREEERR MMEEEEEQQW WRVLLLLLEE
             ALQFEESSLL ENRRREEVEE ELGRSSGLLE EKTGREEERR MMEEEEEQQW WRVLLLLLEE

             TRRAGGGGLS AVPLGSNRMV VGGGFE--HS ADCTTGPPSA SSAEEAGGDD GGGGGDNNNN
             ENNAGGGGFE SVKLSNYKIL LWWWYSYYSI KPFSSDPPTL SSSRR----- ----------
             ENNAGGGGFE SVKLSNYKIL LWWWYSYYSI KPFSSDPPTL SSSRR----- ----------

             SSVSSSGGSS SSGDDSSNNN SSSSSGGSGS AAADDSVVC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- MMTTTFPPQ- ------WWWW
t5g66770.1   AYYMMMMMDD SSGGNNLMMM MMAAIIIAAQ QQVQQQEEHH HHIIIFGGNL SNNPPPWWWW
t5g66770.2   AYYMMMMMDD SSGGNNLMMM MMAAIIIAAQ QQVKQQEEHH HHIIIFGGNL SNNPPPWWWW

             PAASSGGLLA AAGFFPPPPP PAAV------ ---------A AAPPPDDGGG Y---------
             PSLFFGGLLG GGSAAPPDDP PQQVTGGGGD PPNHHHHHHA AATTTTTGGG FLSSFGGGGG
             PSLFFGGLLG GGSAAPPPPP PQQVTGGGGD PPNHHHHHHA AATTTTTGGG FLSSFGGGGG

             ---------- ---------- ---------- ---------- -DPPA----G ADD-------
             TGGGEESDEE WTTTLLISGG DSSSVAADDG GGGPPDDDWW WDNPDVVIIG PDDPFFSRLV
             TGGGEESDEE WTTTLLISGG DSSSVAADDG GGGPPDDDWW WDNPDVVIIG PDDPFFSRLV

             -----VDAAL LPEEFAFPPP PPAAAAAVV- AAMRRRRREE EEVVGGGR-- ----------
             QPSDNIDTTP PLPPPTWPPP PPSSPPPLLE SSPTTTKKEE PETTDDDEDD DDDFLLEEPP
             QPSDNIDTTP PLPPPTWPPP PPSSPPPLLE SSPTTTKKEE PETTDDDEDD DDDFLLEEPP

             LLVVVLMCCA GGAIIEEAAL ASSLAHHAAL LASSASGGRA AVHFTTASRF PPVAATTTAA
             LLLLLIYCCA --RIISSNNE ASSLLRREEV VEGGPTEERA AFYFTTASRS PPATTSSSSS
             LLLLLIYCCA --RIISSNNE ASSLLRREEV VEGGPTEERA AFYFTTASRS PPATTSSSSS

             EAAFFFFLL- --HHHHHHYE AACLLKAAHF TNAILAAFHG GGCCCHHHVH HVIDDFFSLG
             TDDLLLLIIL SSKTTTTTND AACSSKAAHL TNAILAATEK KKSSSKKKIH HIVDDFFGIG
             TDDLLLLIIL SSKTTTTTND AACSSKAAHL TNAILAATEK KKSSSKKKIH HIVDDFFGIG

             LAAIAAPPGG PPF-LIIITG IIIPSPTTRR -RVGGLDRRV SRGASDDEEP PQAAAASVLQ
             IAALAATTSG PPTQIVVVSG IIIPSLGGSS PITGGLDDDL DIPLPHHLLG GRDDDVFMLQ
             IAALAATTSG PPTQIVVVSG IIIPSLGGSS PITGGLDDDL DIPLPHHLLG GRDDDVFMLQ

             QHRLLPPDQQ APP--IVVLL DDAAAVVVRR KKFFFTEQQA KTTGGGFLDT TALYYYYYAA
             QYKLL--DEE TPPTTVAALL RRKKKLLLNN RRVVVTEYYV RVVGGGFANK KALFFFFYAA
             QYKLL--DEE TPPTTVAALL RRKKKLLLNN RRVVVTEYYV RVVGGGFANK KALFFFFYAA

             AAVFDSSDGA AMAAAYYLLQ QRRIIDVVVC CEAA--RREE RHHHEEPPLS RRRDDDRLTG
             AAVFESSEDE VRVVELLFFG GRRIIGIIIG GEGGIIHREE RMMMEEEEKE QQRVVVLMEG
             AAVFESSEDE VRVVELLFFG GRRIIGIIIG GEGGIIHREE RMMMEEEEKE QQRVVVLMEG

             AVLGNLRQQM MLVGLFSSEG --HHVAADDL TTLRFAAASS AWEEEAGGGG GGDDDNNNNN
             SVLSYVSQQI ILLWNYNNSN YYSSVKKPPI SSLLLLLLSS SWRRR----- ----------
             SVLSYVSQQI ILLWNYNNSN YYSSVKKPPI SSLLLLLLSS SWRRR----- ----------

             NSVSGSSSSG SSDNSGSSSN NGSGAADDGS SSVVCCCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- MTTP------ --------WP DDASSLAAFL
t5g66770.1   MAMDDNNNLL MMIIAQQVVK QQQQQQQQQD HIIGPLLSLL NNPPPPPPWP NNSGFLGGAF
t5g66770.2   MAMDDNNNLL MMIIAQQVVK QQQQQQQQQD HIIGPLLSLL NNPPPPPPWP NNSGFLGGAF

             LPPPA----- ---------- --------PD DDVVGGG--- ---------- ----------
             FPDPFGGGDS SSSDDGFFFP PPPPDHHHTT TTGGGGGRRR LLSSGGGTTT GGGEEESEWE
             FPPPFGGGDS SSSDDGFFFP PPPPDHHHTT TTGGGGGRRR LLSSGGGTTT GGGEEESEWE

             ---------- ---------- ------YPA- ----AD---- ---------- --VVAAALLL
             TLIISGGGGD SSVVAADGGG DDDDDTHNDY YVYYPDPPFT YPPRLVQQPD DLIITSSPPP
             TLIISGGGGD SSVVAADGGG DDDDDTHNDY YVYYPDPPFT YPPRLVQQPD DLIITSSPPP

             LAAAAFFPPC CPAL-AMMEE GGIR------ -----LLLHH HHLLLMMSSS CGAIEAGAAS
             PPTTLWWPPS SPSTESPPDE DDSEDDDFFD DDEEPLLLKK KKIIIYYDDD C-RISDSNNS
             PPTTLWWPPS SPSTESPPDE DDSEDDDFFD DDEEPLLLKK KKIIIYYDDD C-RISDSNNS

             AQQDHHASAA ASRRAAVHFF FFTRLFPPAA PPPPTTTDAH AFF-HFFYEE ECPYYYLKFA
             KTTQRREGPP PTRRAAFYFF FFENLSPPTT SSSSSSSSSE DLLSKLLNDD DCPYYYSKFA
             KTTQRREGPP PTRRAAFYFF FFENLSPPTT SSSSSSSSSE DLLSKLLNDD DCPYYYSKFA

             AHHHFFFTTA AANQQQILLA AAFFHCDDHH VIIIFSMQQQ QLLQIQLLLL LRPGPFF-RR
             AHHHLLLTTA AANQQQILLA AATTESNNKK IVVVFGVQQQ QIIQLQLLLL TRTSKTTQRR
             AHHHLLLTTA AANQQQILLA AATTESNNKK IVVVFGVQQQ QIIQLQLLLL TRTSKTTQRR

             ITIGPTGRRD -LRDGLLAAA DDLRSSRRRV RFFSRRAAAN SSLDEEPWWW WMLQIIIIAA
             VSIPAGESSP SLIAGNNRRR DDFKVVDDDL NFFDIILTT- PPIHLLGSSS SSFRVVVVDD
             VSIPAGESSP SLIAGNNRRR DDFKVVDDDL NFFDIILTT- PPIHLLGSSS SSFRVVVVDD

             AEVFFNNVQL RLLLLGDDD- --IIIIIDAA AADAASSSVP KTQEEEAADD DNTTFDDDRT
             DELVVNNMQL KLLLL--DDT TIVVVVVDTT TTRKKSSSLP RTYEEEVVSS SNVVFNNNRK
             DELVVNNMQL KLLLL--DDT TIVVVVVDTT TTRKKSSSLP RTYEEEVVSS SNVVFNNNRK

             TEFFYSSSVF SLAAAASASG AGGNNEEE-- AAAYYLQRRE ICCIVVCGGA -RERRHEEPP
             KNQQFSSSVF SLPNNNLGRS EEERREEERR EEELLFGRRR ISSLIIGPKG IHERRMEEEE
             KNQQFSSSVF SLPNNNLGRS EEERREEERR EEELLFGRRR ISSLIIGPKG IHERRMEEEE

             RWWRDDRAGS AVVVLGGSLR RQAAMGGLLL FSSSGG--HV VEEDDDDDGL LLWHHGGGPP
             QWWRVVLAGE SVVVLSSNVS SQAAIWWNNN YNNNYYYYSV VEEPPPPPGI LLWNNDDDPP
             QWWRVVLAGE SVVVLSSNVS SQAAIWWNNN YNNNYYYYSV VEEPPPPPGI LLWNNDDDPP

             PLSAAAGDGG DNNNNGGSSD DDSNSSSNGS AADDGGSSL
             PLTLS----- ---------- ---------- ---------
             PLTLS----- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ------MFFF -----WWPPP
t5g66770.1   MYYYYMMMMM MCSGGMMMAI AAQQQQIKQQ QEEEQQQQQQ QQQQDDHFFF PLNNPWWPPP
t5g66770.2   MYYYYMMMMM MCSGGMMMAI AAQQQQIKQQ QEEEQQQQQQ QQQQDDHFFF PLNNPWWPPP

             DDAAASSSSL DAAGGLPPPP PAVVV----- ---------- --PPPDDG-- ----------
             NNSSSGGFFL SGGSSFPDPP PFVVVGGGGG DNDPFFPFPP HHTTTTTGRR LSSSSSFFFG
             NNSSSGGFFL SGGSSFPPPP PFVVVGGGGG DNDPFFPFPP HHTTTTTGRR LSSSSSFFFG

             ---------- ---------- ---------- --YPPP---A D--------- --------VV
             GGGTGGGFSE MMIISSGGGS SSSSVAPPDC TTHNNNYIIP DPPDDTYRLL LSVQSLLNII
             GGGTGGGFSE MMIISSGGGS SSSSVAPPDC TTHNNNYIIP DPPDDTYRLL LSVQSLLNII

             DAAALEEFAA FFPPPCAAPA AA----AMEE EEGIR----- -----VVVVL MSSCIEGGDH
             DTSSPPPPPL WWPPPSSSPS PPEEEESPED DPDSEDDDFF FLEPPLLLLI YDDCISSSDP
             DTSSPPPPPL WWPPPSSSPS PPEEEESPED DPDSEDDDFF FLEPPLLLLI YDDCISSSDP

             AAAQQLLSHA AALAAAAAGG VAAFALSRRR RFF-PAPTDA AHHHAF--YH HFEACCPPYY
             AKKTTLLIRS SSVDDDPPEE VAAFALSNNR RSSNPTSSSS SEEEDLLSYK TLDACCPPYY
             AKKTTLLIRS SSVDDDPPEE VAAFALSNNR RSSNPTSSSS SEEEDLLSYK TLDACCPPYY

             YYLFFAAANQ AAILLEFFHH HCCCDHHHHV VVVIDFSGQQ WPLILLRPPG GPFF--LLRI
             YYSFLAAANQ AAILLETTEE ESSSNKKKKI IIIVDFGGQQ WPLLTTRTTS GKTTQQIIRV
             YYSFLAAANQ AAILLETTEE ESSSNKKKKI IIIVDFGGQQ WPLLTTRTTS GKTTQQIIRV

             TGGIGSSSPP PTTGRD-LRV VVGRADDDDL LLARSVVRRR RFFFRGVVAA ASLLDDEVRR
             SGGIPSSSLL LGGESPSLIT TTGRRDDDDF FFAKVLLDDD NFFFIPIILT TPIIHHLLNN
             SGGIPSSSLL LGGESPSLIT TTGRRDDDDF FFAKVLLDDD NFFFIPIILT TPIIHHLLNN

             PWWAPPVVFF FSLQLHRLLL LLDAAIAAAV CVSSPKKITV IIIHHNNKTT DTALLFYYYY
             GSSDPPLLVV VFLQLYKLLL LL-TTVTTTA LASSPRRVTL GGGLLNNRVV NKALLQFFYY
             GSSDPPLLVV VFLQLYKLLL LL-TTVTTTA LASSPRRVTL GGGLLNNRVV NKALLQFFYY

             AVFFDDDSLL LDASAASGGA MAE-AYYLQR RREEEECDII VCCGGEEGAA --RREERRHE
             AVFFEEESLL LENLGGRDSE RVERELLFGR RRRRRRSGLL IGGPPEEKTG IIRREERRME
             AVFFEEESLL LENLGGRDSE RVERELLFGR RRRRRRSGLL IGGPPEEKTG IIRREERRME

             PWWRRRDLTA SAAAVGGSNA RARRMMLGGG LFFFSGGHVV VVEADDGCCL LLWHPSAAAA
             EWWRRRVMEA ESSSVSSNYA SAKKIILWWW NYYYNNNSVV VVEKPPGFFI LLWNPTLLLS
             EWWRRRVMEA ESSSVSSNYA SAKKIILWWW NYYYNNNSVV VVEKPPGFFI LLWNPTLLLS

             AWWWAGGNNN NSSNSSSNGS SSSSSSGGSG SSSARGSSL
             SWWW------ ---------- ---------- ---------
             SWWW------ ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- MDTFP----- --WMMMMAAD
t5g66770.1   MMAYTDDSGA AAIAQVVVII KQKQQEEQQQ QQQQQHHHDD HQIFGPPSSL NPW----LLS
t5g66770.2   MMAYTDDSGA AAIAQVVVII KKKQQEEQQQ QQQQQHHHDD HQIFGPPSSL NPW----LLS

             DAGGFPPPAA ---------- ---------- ----AAADDD GVG------- ----------
             SGSSAPPPQQ TTTGGGSSDP GGPFPPNLLD DHHHAAATTT GGGRSFFFGG TTGEEEWMLL
             SGSSAPPPQQ TTTGGGSSDP GGPFPPNLLD DHHHAAATTT GGGRSFFFGG TTGEEEWMLL

             ---------- DPA-GDDD-- ---------- ---------- VDALLPEAFC AAAAPAAALM
             SGGDSADDDD DPDYGDDDPD DTYPPRRRSV VPSDDDNVVV IDSPPLPTWS SSSSPSSPTP
             SGGDSADDDD DPDYGDDDPD DTYPPRRRSV VPSDDDNVVV IDSPPLPTWS SSSSPSSPTP

             RRREEEEEEE EGIRR----- -----LVVVL LMCCCAEAGG DDDDHALAAS AAQQQHAAAV
             TTKEEEDPEE EDSEEDDDDD DEPPPLLLLA IYCCCRSDSS DDDDPNEAAS KKTTTRESSL
             TTKEEEDPEE EDSEEDDDDD DEPPPLLLLA IYCCCRSDSS DDDDPNEAAS KKTTTRESSL

             SSSSASIRVV VFTTTARRRR LLPPPVPPTT HAAFLL---- YHYAAPYYYL LLKFHHFTNN
             GGGGDT-RVF FFTTTANNNR LLPPPASSSS EDDLIILSSS YTNAAPYYYS SSKFHHLTNN
             GGGGDT-RVF FFTTTANNNR LLPPPASSSS EDDLIILSSS YTNAAPYYYS SSKFHHLTNN

             QQAAILLLEE AAAFFFHGDH VVVHHHHIII DDDFFFMGGL LPPAALAARG GGPPPFLLRR
             QQAAILLLEE AAATTTEKNK IIIHHHHVVV DDDFFFVGGI IPPAALAARS SGKKPTIIRR
             QQAAILLLEE AAATTTEKNK IIIHHHHVVV DDDFFFVGGI IPPAALAARS SGKKPTIIRR

             IGGIIGPPPP SPTDE--LRR DDLSSVRRVV RSFFGVAADR RPPWMAAEAV NSVQQRLLLL
             VGGIIPPPPP SLGPEPPLII AAFVVLDDLL NDFFPILTHN NGGSSDDEVL NFMQQKLLLL
             VGGIIPPPPP SLGPEPPLII AAFVVLDDLL NDFFPILTHN NGGSSDDEVL NFMQQKLLLL

             LGPAAAADDQ AP---DDAVL LLLVVSSPKI FFFTTVIIEE EEQQEEAAAA DDHHHNNNTT
             L------DDE TPTIIDDTAL LLLAASSPRV VVVTTLGGEE EEYYEEVVVV SSLLLNNNVV
             L------DDE TPTIIDDTAL LLLAASSPRV VVVTTLGGEE EEYYEEVVVV SSLLLNNNVV

             GFRFEEAAYY YSFDDSSSDA AASGGGGGNA AMMEAAQEEE ICDVVVVCEE AA----RRHP
             GFRVNNAAFY YSFEESSSEP GGRDDSSERV VRREEEGRRR ISGIIIIGEE TTIIIIHRME
             GFRVNNAAFY YSFEESSSEP GGRDDSSERV VRREEEGRRR ISGIIIIGEE TTIIIIHRME

             LSWWRRDRLR GLLAVVPNNN AAQLLVGLLF SG--SVEEAG LLTRPSAAAA AAGDGGGGDD
             KEWWRRVLMN GFFSVVKYYY AAQLLLWNNY NYLLIVSSKG IISLPTSS-- ----------
             KEWWRRVLMN GFFSVVKYYY AAQLLLWNNY NYLLIVSSKG IISLPTSS-- ----------

             NNNNNNNSSS SVDDSNSSSS SNGGKKSSGA DSSVCCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -----MMMDD TFF----PPD PLLFLLAAVV
t5g66770.1   AYYYYMCTTD SGGGMIIVVI KQKKQQQQQQ QQQQDHHHQQ IFIPLNPPPN TLLAFFQQVV
t5g66770.2   AYYYYMCTTD SGGGMIIVVI KKKKQQQQQQ QQQQDHHHQQ IFIPLNPPPN TLLAFFQQVV

             ---------- ---------- -------PDD DDVY------ ---------- ----------
             TTGGGGDDSN NGFFFPFLLL DDHHHHHTTT TTGFRLSSDD FFGGGTGGEE FEEDDEEELL
             TTGGGGDDSN NGFFFPFLLL DDHHHHHTTT TTGFRLSSDD FFGGGTGGEE FEEDDEEELL

             ---------- ---------- --YPPP---- ---------- ---------- VAAAAALLPF
             LLIGGDSSSV DDDGGDDDDD DDHNPPYYVV YYFDDTYSSR SQQQPPNNRV ITSSSSPPLP
             LLIGGDSSSV DDDGGDDDDD DDHNPPYYVV YYFDDTYSSR SQQQPPNNRV ITSSSSPPLP

             AAAPPDDAAA ALL-AAMMMR RREEEVAG-- ---------L HHHLGIIAGD HHAAAASSQL
             PPLPPLLIIP PTTHSSPPPT KKEEDTNDDD DDFFEPPPPL KKKI-IIDSD PPNNNNSSTL
             PPLPPLLIIP PTTHSSPPPT KKEEDTNDDD DDFFEPPPPL KKKI-IIDSD PPNNNNSSTL

             ASSSHAAAAA SSIVVVAAHF FFTTAALSR- PPPVVAAPPT TTDDAEHHF- YFFYYCPYYL
             LIIIRESSED TT-VVVAAYF FFTEAALSNN PPPAATTSSS SSSSSTEELL YLLNNCPYYS
             LIIIRESSED TT-VVVAAYF FFTEAALSNN PPPAATTSSS SSSSSTEELL YLLNNCPYYS

             LKAAHFNQAA IILLLAFHGH VHIIIDFSSL LQGQWWPPPL LQALLGGGPP PFLLRGPPPP
             SKAAHLNQAA IILLLATEKK IHVVVDFGGI IQGQWWPPPL LQATTSSGKP PTIIRGAAAP
             SKAAHLNQAA IILLLATEKK IHVVVDFGGI IQGQWWPPPL LQATTSSGKP PTIIRGAAAP

             PPTRDERDDV VLLRAAADDL LAAAAVRVRF FFSFRRGVVV AAASSVVWWW LLQIIGGGEV
             PLGSPEIAAT TNNRRRRDDF FAAAALDLNF FFDFIIPIII TTTPPLLSSS FFRVVDDDEL
             PLGSPEIAAT TNNRRRRDDF FAAAALDLNF FFDFIIPIII TTTPPLLSSS FFRVVDDDEL

             VVVQHLGDPP PAADQAP--D DAAAVLLCCC VSSVRRPVII EEQADHKKKR RFTTTALLFY
             MMMQYL---- ---DETPIID DTTTALLLLL ASSLNNPLGG EEYVSLRRRR RVKKKALLQF
             MMMQYL---- ---DETPIID DTTTALLLLL ASSLNNPLGG EEYVSLRRRR RVKKKALLQF

             YAAFDLLLLD DAAASGGAGN AAAMAAA--A YLLLQEEEEI CDVEGAAAAA RREEERRHEP
             YAAFELLLLE EPPPRDDEER VVVRVVVRRE LFFFGRRRRI SGIEKTTGGG HHEEERRMEE
             YAAFELLLLE EPPPRDDEER VVVRVVVRRE LFFFGRRRRI SGIEKTTGGG HHEEERRMEE

             LSRWRDAGGL LSPLLLSSNR QARMLVGGGF SGGGHSSVVE GGCLTGWWWH HGLAWWGDDG
             KEQWRVAGGF FEKLLLNNYS QAKILLWWWY NYYNSIIVVS GGFISAWWWN NDLSWW----
             KEQWRVAGGF FEKLLLNNYS QAKILLWWWY NYYNSIIVVS GGFISAWWWN NDLSWW----

             GGNSSNNNSS NGSSGGDDNN SSSNGSSSSA ADDGGSCCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --TQ----WP PMMASLDDDA FFAA------
t5g66770.1   MMAMCCGGMA AIAAAQQVQK QQQQQQHHQD DDINLLSSWP P--SFLSSSG AAFFTTGSND
t5g66770.2   MMAMCCGGMA AIAAAQQVKK QQQQQQHHQD DDINLLSSWP P--SFLSSSG AAFFTTGSND

             ---------- -----APDDD DVY------- ---------- ---------- ----------
             PGGGPNNNNN LHHHHATTTT TGFRRRSSDF GGGTGEFFSD DEWEETLLSV DDDGPPPDDD
             PGGGPNNNNN LHHHHATTTT TGFRRRSSDF GGGTGEFFSD DEWEETLLSV DDDGPPPDDD

             -YDPPPA--- --AD------ ---------- --VDDDAAAL PPPEFFAAAA APDAAAV-AR
             THDNNPDVVI IYPDPPDTTY YYSRRVPSDD RVIDDDTTSP LLLPPPPPLL LPLSIPLEST
             THDNNPDVVI IYPDPPDTTY YYSRRVPSDD RVIDDDTTSP LLLPPPPPLL LPLSIPLEST

             RREEEEGIRR ----LHHLMM SCCAGAEHHA SSSAQQLADS SHHHAASSSS SIIGGGRRVH
             TKEEDPDSEE DDLPLKKIYY DCCA-RSPPN SSSKTTLLQI IRRRSSGGTT T--EEERRVY
             TKEEDPDSEE DDLPLKKIYY DCCA-RSPPN SSSKTTLLQI IRRRSSGGTT T--EEERRVY

             FFFTAALSRR RPSSPPATTA E--YHFYYYE EEACPFAFQA ILEAFFHGCC DHVSSSLLQL
             FFFTAALSRR RPSSPPTSSS TLSYKLNNND DDACPFALQA ILEATTEKSS NHIGGGIIQI
             FFFTAALSRR RPSSPPTSSS TLSYKLNNND DDACPFALQA ILEATTEKSS NHIGGGIIQI

             LQQWWPALII ALRRPPPPFF LIPSPPGRRD DEE-LLVGGG LLLLLLLARV RRSSSRRGVA
             IQQWWPALLL ATRRTPPPTT IVPSLLESSP PEEPLLTGGG NNLLFFFAKL DNDDDIIPIL
             IQQWWPALLL ATRRTPPPTT IVPSLLESSP PEEPLLTGGG NNLLFFFAKL DNDDDIIPIL

             ALLDDEERRP PWMMMQAAPP EAAFNLLLQQ HHRLLGGDDD PDQAAA-IID AVLLCCCCCV
             TIIHHLLNNG GSSSSRDDPP EVAVNLLLQQ YYKLL----- -DETTTTVVD TALLLLLLLA
             TIIHHLLNNG GSSSSRDDPP EVAVNLLLQQ YYKLL----- -DETTTTVVD TALLLLLLLA

             SSVVKKIIIF TTVIQEAADD NNTTTFDRFA ALLFFFFYYY SAFFFDSSSL LLAAAAASSG
             SSLLRRVVVV TTLGYEVVSS NNVVVFNRVA ALLQQQQYYY SAFFFESSSL LLPPNNNLRD
             SSLLRRVVVV TTLGYEVVSS NNVVVFNRVA ALLQQQQYYY SAFFFESSSL LLPPNNNLRD

             GGNNMMEE-Y LLQRICCIIG EA-RREERHE EPPPPLLSWW RRDRRRLTRR GLLLPLSNLL
             SERRRREERL FFGRISSLLP ETIRREERME EEEEEKKEWW RRVLLLMENN GFFFKLNYVV
             SERRRREERL FFGRISSLLP ETIRREERME EEEEEKKEWW RRVLLLMENN GFFFKLNYVV

             RRRQMMLVVG LLLFSSGE-- VADCCLLGGW HHHRRRPLLL FSAAGGDDGG GGNNNNNNNS
             SSSQIILLLW NNNYNNYSLL VKPFFIIAAW NNNLLLPLLL LTSS------ ----------
             SSSQIILLLW NNNYNNYSLL VKPFFIIAAW NNNLLLPLLL LTSS------ ----------

             NNVSGGSSSD DSSNSSSNGG KSGGGADDDG SSSSVCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----MMDDTT FFFFF---PP MPAAASSSLA
t5g66770.1   MMYYMCTTDG GNLLAIQQQV VIIIKEQQQQ HQQDHHQQII FFFIILNPPP -TSLLGFFLG
t5g66770.2   MMYYMCTTDG GNLLAIQQQV VIIIKEQQQQ HQQDHHQQII FFFIILNPPP -TSLLGFFLG

             FLPPAA---- ---------- ---------- -AAPDGGYYY ---------- ----------
             AFPDFFTTGG GGDDDDDDDP PFNLLDDHHH HAATTGGFFF RLDFGGGGEF FESDDWWETT
             AFPPFFTTGG GGDDDDDDDP PFNLLDDHHH HAATTGGFFF RLDFGGGGEF FESDDWWETT

             ---------- ---------- -----YPPP- ---GD----- ---------- --VVVDAAAE
             LIIGGGGSVV DDDGGPPPPD CCCCTHNNNY VIIGDFDTTY SRRLLSPPDL NRIIIDTTSP
             LIIGGGGSVV DDDGGPPPPD CCCCTHNNNY VIIGDFDTTY SRRLLSPPDL NRIIIDTTSP

             FFAAAFFFPP AAAA-ARRRR REEEEGIIR- ---LVHLLMS SCAAAEAAGD AAALAADHHA
             PPPTLWWWPP SSIPESTTTT KDEEEDSSED DFPLLKAIYD DCAARSDDSD NNALLLQRRS
             PPPTLWWWPP SSIPESTTTT KDEEEDSSED DFPLLKAIYD DCAARSDDSD NNALLLQRRS

             LAAAAVVAGI RVVVAAHFFF FTTLLSRRRL PPSVPPTAEH F--PPYLKKF AHHFFAILLE
             VSSSELLP-- RVVVAAYFFF FTELLSRRRL PPSASSSSTE LLSPPYSKKF AHHLLAILLE
             VSSSELLP-- RVVVAAYFFF FTELLSRRRL PPSASSSSTE LLSPPYSKKF AHHLLAILLE

             AFFHDDFSMM LLQWPAALLI QLLAALPGGG GPIIPPPTGR E-----LLRR VVGRLLVVRR
             ATTKDDFGVV IIQWPAALLL QLLAATTSSG GPVVAALGES EPPPSSLLII TTGRLFLLDD
             ATTKDDFGVV IIQWPAALLL QLLAATTSSG GPVVAALGES EPPPSSLLII TTGRLFLLDD

             RRRSSSFRRR GGAANSSSLD DEVWMQPPGA NVVVLQQHLD AADQQA---- IIIAAVVLDD
             DNNDDDFIII PPTT-PPPIH HLLSSRPPDA NMMMLQQYL- --DEETTTTI VVVTTAALRR
             DNNDDDFIII PPTT-PPPIH HLLSSRPPDA NMMMLQQYL- --DEETTTTI VVVTTAALRR

             CCVVVAASSS VVVRPPFFTV IEEQEAHKKK TGGFDTTAYY YYAVVVDDDD SLAAAAAASG
             LLAAAKKSSS LLLNPPVVTL GEEYEVLRRR VGGFNKKAFY YYAVVVEEEE SLPPPPNNRD
             LLAAAKKSSS LLLNPPVVTL GEEYEVLRRR VGGFNKKAFY YYAVVVEEEE SLPPPPNNRD

             GGGGGAGGGA AMMMAEAYRC CDIIIVGEAA -RRRREEEPP LLLSWRTGGS PPPLGNNAAA
             DDDSSEEEEV VRRRVEELRS SGLLLIPEGG IHRRREEEEE KKKEWREGGE KKKLSYYAAA
             DDDSSEEEEV VRRRVEELRS SGLLLIPEGG IHRRREEEEE KKKEWREGGE KKKLSYYAAA

             LLQMMVVVVG LLSGEG---- HHSSVEADGG CCLTLGHHHG GRPAGGGGGD DNNNNNNNNN
             VVQIILLLLW NNNYSNLLLY SSIIVEKPGG FFISLANNND DLP------- ----------
             VVQIILLLLW NNNYSNLLLY SSIIVEKPGG FFISLANNND DLP------- ----------

             SNSSVVSSGS SSGSSDNNNS SNGSGGAARD DGSSVVCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---MDDTPPF ----WPMPAA SSGFFLPPPP
t5g66770.1   MMCCTGMAII QQQQKKKQKQ QQQQQQQQHH DDDHQQIGGI PLLLWP-TSS GFSAAFPDDD
t5g66770.2   MMCCTGMAII QQQQKKKKKQ QQQQQQQQHH DDDHQQIGGI PLLLWP-TSS GFSAAFPPPP

             PAAVV----- ---------- -PGGYY---- ---------- ---------- ----------
             PFQVVTGGGD DSDPFFFPNN DTGGFFLLSS SSSDDFGGTT GGGGESSSSS SDDEEEMETS
             PFQVVTGGGD DSDPFFFPNN DTGGFFLLSS SSSDDFGGTT GGGGESSSSS SDDEEEMETS

             ---------- --------YY YDDPPPP-AA ---------- ------VDDA AALLPEEEFF
             GGGDSSVADG GGPDDCTWHH HDDNNNPIPP FDDTSVVQQP DDLLVVIDDT SSPPLPPPPP
             GGGDSSVADG GGPDDCTWHH HDDNNNPIPP FDDTSVVQQP DDLLVVIDDT SSPPLPPPPP

             AAAFPPCCPD AAAAAVL-AA RRREEEEVAG --------LV LMAGGIEGDD DDALLASSLA
             TTLWPPSSPL SSIPPLTHSS TKKEDDPTND DDDDDLEPLL AYA--ISSDD DDNEEASSLL
             TTLWPPSSPL SSIPPLTHSS TKKEDDPTND DDDDDLEPLL AYA--ISSDD DDNEEASSLL

             DDHALAAAAA ASGIGGRRRR VVVTTTTAAA SRRRLPAPPP PTTDDAAFFL ---YYHHYCC
             QQREVSDDDD PT--EERRRR FFFTEEEAAA SNRRLPTSSS SSSSSDDLLI LSSYYTTNCC
             QQREVSDDDD PT--EERRRR FFFTEEEAAA SNRRLPTSSS SSSSSDDLLI LSSYYTTNCC

             CCPYYFHHFT TTANNNNNQA IFCDDHVIDD DFQQQQQGGL QWWQAGGLRT GGIIPSPRRD
             CCPYYFHHLT TTANNNNNQA ITSNNKIVDD DFQQQQQGGI QWWQASGIRS GGIIASLSSP
             CCPYYFHHLT TTANNNNNQA ITSNNKIVDD DFQQQQQGGI QWWQASGIRS GGIIASLSSP

             ---RDVGRRL LLADRRSSVF SRGGGGVVAN NNNDEEVVPP WLQAAPEEAF LHRRLLPADD
             SSSIATGRRL LLRDKKVVLF DIPPPPIIT- ---HLLLLGG SFRDDPEEAV LYKKLL--DD
             SSSIATGRRL LLRDKKVVLF DIPPPPIIT- ---HLLLLGG SFRDDPEEAV LYKKLL--DD

             DQQQPPP--D ALLLDDCCVV ASRPPFFFFT IIIIEEQEAH KKGFFRFTEL LFFYYYYSSS
             DEEEPPPTID TLLLRRLLAA KSNPPVVVVT GGGGEEYEVL RRGFFRVKNL LQQFYYYSSS
             DEEEPPPTID TLLLRRLLAA KSNPPVVVVT GGGGEEYEVL RRGFFRVKNL LQQFYYYSSS

             VVDDLDAAAS SGGGGAAMAA E--AYLLLQQ RRREIIICVV VGGGEGGAAR RRHHPPLLWR
             VVEELEPNNL LDDSSVVRVV ERRELFFFGG RRRRIIISII IPPPEKKTGH RRMMEEKKWR
             VVEELEPNNL LDDSSVVRVV ERRELFFFGG RRRRIIISII IPPPEKKTGH RRMMEEKKWR

             DRLLLTTRRR RGLSVPLGSS ALQAAALLLL FFS--VEADD GTLGRPFAAW WWAAADDDGG
             VLMMMEENNN NGFEVKLSNN AVQAAANNNN YYNYYVEKPP GSLDLPLLLW WW--------
             VLMMMEENNN NGFEVKLSNN AVQAAANNNN YYNYYVEKPP GSLDLPLLLW WW--------

             DDDNNNNNSS SSVGSGDNNN GKSSGGRDDS SSVCLLLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---DTTFPF- ----PPDDPP AAAASLLGGG
t5g66770.1   YTTDGLMAII AAQQQQQQKK QKKKQQQQQQ QHDQIIFGIP NNNPPPNNTT SSLLGLLSSS
t5g66770.2   YTTDGLMAII AAQQQQQQKK KKKKQQQQQQ QHDQIIFGIP NNNPPPNNTT SSLLGLLSSS

             GFFPPPAAAA AA-------- ----APPDDG GVYY------ ---------- ----------
             SAAPPPFFQQ QQGSPGGPPF FFDDATTTTG GGFFRLSDFF GGGGGGGGGG GGESSDMTLL
             SAAPPPFFQQ QQGSPGGPPF FFDDATTTTG GGFFRLSDFF GGGGGGGGGG GGESSDMTLL

             ---------- ---------- -YDDPP--GG GAAA------ ---------- ---------V
             LISGGGVVVA AAPDCCDTTT THDDNPYYGG GPPPPPPFTT TPPSRLVQQQ QPSDDDLLNI
             LISGGGVVVA AAPDCCDTTT THDDNPYYGG GPPPPPPFTT TPPSRLVQQQ QPSDDDLLNI

             APPAFFPAAD AAAA--AAMM MREAAA---- ---------- VVVLLMSCII IEAAADDHLL
             SLLLWWPSSL SSPPHHSSPP PKPNNNDDDD DDFDDDEEPP LLLAIYDCII ISDDDDDPEE
             SLLLWWPSSL SSPPHHSSPP PKPNNNDDDD DDFDDDEEPP LLLAIYDCII ISDDDDDPEE

             AQQLDSHAAA LLAVSAASSI IRRVAVFTLS RRRLLFP--- PVVVAAPTDE HHLL--YHFF
             ATTLQIREES VVELGDDTT- -RRVAFFTLS NRRLLSPNNN PAAATTSSST EEIILLYTLL
             ATTLQIREES VVELGDDTT- -RRVAFFTLS NRRLLSPNNN PAAATTSSST EEIILLYTLL

             YEACCLKFFF HHFAAANQLE AAAAAFHHHG DDVFSQLLLW PPAIQQAAGG GGGGGGF-LR
             NDACCSKFFF HHLAAANQLE AAAAATEEEK NNIFGQIIIW PPALQQAASS SSGGGGTQIR
             NDACCSKFFF HHLAAANQLE AAAAATEEEK NNIFGQIIIW PPALQQAASS SSGGGGTQIR

             RRIGIIPPTT TGRRRDDDEL RRRDDGLRRA ADLLRVRRSF RRGGVAAAAN NNDDEERLLL
             RRVGIIALGG GESSSPPPEL IIIAAGNRRR RDFFKLDNDF IIPPILLLL- --HHLLNFFF
             RRVGIIALGG GESSSPPPEL IIIAAGNRRR RDFFKLDNDF IIPPILLLL- --HHLLNFFF

             LQAAAVVAAF FNSSVRGDAD A---DVDCCV VAASSVVPKI ITTEHHNFRF TEALLFYSVF
             FRDDVLLAAV VNFFMK---D TTTIDARLLA AKKSSLLPRV VTTELLNFRV KNALLQFSVF
             FRDDVLLAAV VNFFMK---D TTTIDARLLA AKKSSLLPRV VTTELLNFRV KNALLQFSVF

             DSLLLDAASA ASGGNNAAEE AYRECDVCGA AAAA---REH HEEESSSRRR RTAGGLLAPP
             ESLLLEPPLG GRDSRRVVEE ELRRSGIGKT TTTGIIIREM MEEEEEEQQR LEAGGFFSKK
             ESLLLEPPLG GRDSRRVVEE ELRRSGIGKT TTTGIIIREM MEEEEEEQQR LEAGGFFSKK

             LGSAALLRML LVGGLLFFFS GEEGG-HHSS VVVEEEDDGC LTGGHGPLLL FFSAASWWEA
             LSNAAVVKIL LLWWNNYYYN YSSNNLSSII VVVEEEPPGF ISAANDPLLL LLTLLSWWR-
             LSNAAVVKIL LLWWNNYYYN YSSNNLSSII VVVEEEPPGF ISAANDPLLL LLTLLSWWR-

             ADGGGGDNNN NNNSSSNSSG NSSSSSSGGG AARDGSSSL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- TQ-------- -WWWMMDASS SSLFFLPPPV
t5g66770.1   YYTTDDSSLM IIIQQQKKQE QQQQHHHHQQ INPPLLLNNP PWWW--NSGG FFLAAFPPPV
t5g66770.2   YYTTDDSSLM IIIQQKKKQE QQQQHHHHQQ INPPLLLNNP PWWW--NSGG FFLAAFPPPV

             V--------- ---------- AAPPDDGGVY ---------- ---------- ----------
             VGGGGGSNDD PPFPPPNLHH AATTTTGGGF LSDFFGGTGE EESSSDDEME TTTLISSGSD
             VGGGGGSNDD PPFPPPNLHH AATTTTGGGF LSDFFGGTGE EESSSDDEME TTTLISSGSD

             ---------D PP-----GGA D--------- ---------- ------VVVD AALPFFAAAP
             DDDGGDDDDD PPVVIYYGGP DPDDDTYYPP LLSSQQPPPP DDLNNRIIID SSPLPPTTLP
             DDDGGDDDDD PPVVIYYGGP DPDDDTYYPP LLSSQQPPPP DDLNNRIIID SSPLPPTTLP

             PCADDAAAAA VVVLL---AM MMRREEVVAG ---------V LLSCAAEAAA DDALAASAAA
             PSSLLSSIIP LLLTTHHESP PPTTEETTND DDDDFFFFPL IIDCRRSDDD DDNEAASKKL
             PSSLLSSIIP LLLTTHHESP PPTTEETTND DDDDFFFFPL IIDCRRSDDD DDNEAASKKL

             HAAALAAAVV SGGGRVVAVH FFFASSFFPP -AAPPTEEHH HHLL---HHH ACCPPLLKKF
             REESVSEELL T-EERVVAFY FFFASSSSPP NTTSSSTTEE EEIILLLKKK ACCPPSSKKF
             REESVSEELL T-EERVVAFY FFFASSSSPP NTTSSSTTEE EEIILLLKKK ACCPPSSKKF

             AAAAHHHFTA NNAIILLEEA HGCHHHVHHH VDFFMQQPAA LIIQQQAAAA LRRGPPPPPF
             AAAAHHHLTA NNAIILLEEA EKSKKKIHHH IDFFVQQPAA LLLQQQAAAA TRRGKKPPPT
             AAAAHHHLTA NNAIILLEEA EKSKKKIHHH IDFFVQQPAA LLLQQQAAAA TRRGKKPPPT

             FF-ITTIGPG RRDDDE---R RRDVVVLLDD DRRRSSSVVV FRGVAAANSD EVPMMMQQQI
             TTQVSSIPPE SSPPPEPPSI IIATTTNLDD DKKKVVVLLL FIPILLL-PH LLGSSSRRRV
             TTQVSSIPPE SSPPPEPPSI IIATTTNLDD DKKKVVVLLL FIPILLL-PH LLGSSSRRRV

             APPGEVANNN SVVQLHRLLL GDDPPAA--- AAVRPPPPTT VVEQEHHFLD DDRTALFFYY
             DPPDELANNN FMMQLYKLLL ------TIII TKLNPPPPTT LLEYELLFAN NNRKALQQFY
             DPPDELANNN FMMQLYKLLL ------TIII TKLNPPPPTT LLEYELLFAN NNRKALQQFY

             SAVDSLLDAA ASSGGGAMEA AQRREEICDD VVCCCCCGGG GAA-RRRRRR RRRHEPPLSR
             SAVESLLEPN GRRDEEVREE EGRRRRISGG IIGGGGGPPP KTTIHHHRRR RRRMEEEKEL
             SAVESLLEPN GRRDEEVREE EGRRRRISGG IIGGGGGPPP KTTIHHHRRR RRRMEEEKEL

             LLTTRRGLPP PPLNNAALRR QRMLLVVLFE EG---HSVEE DDLRRLFFSA SWEEAGDGGG
             MMEENNGFKK KKLYYAAVSS QKILLLLNYS SNLLYSIVES PPLLLLLLTL SWRR------
             MMEENNGFKK KKLYYAAVSS QKILLLLNYS SNLLYSIVES PPLLLLLLTL SWRR------

             GNNNNNSNSS GSDDNNSSSS NNNSSGAARR SSSSCCLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------MMMM DDTTTFQ--- -----PMMDD
t5g66770.1   MYMMDDMMMA AAAQVIKKKQ QQQQQQEEEQ QQQHDDHHHH QQIIIFNPPL LSSLPP--NN
t5g66770.2   MYMMDDMMMA AAAQVIKKKK QQQQQQEEEQ QQQHDDHHHH QQIIIFNPPL LSSLPP--NN

             AASSGGLFLL PPPAAVV--- ---------- ------DDGV GYY------- ----------
             SSGGGGLAFF PPDFQVVTTG GDNNDDDPGG FPNLDHTTGG GFFLSGGGGG GGTGEEEEWW
             SSGGGGLAFF PPPFQVVTTG GDNNDDDPGG FPNLDHTTGG GFFLSGGGGG GGTGEEEEWW

             ---------- ----DPP--A ADD------- ---------- ----VDAAPP EFAAFFPPPC
             EISSGGGSAP CCDDDNPYIP PDDPPFDPSS LSVVQPSSSD DLRRIDSSLL PPPTWWPPPS
             EISSGGGSAP CCDDDNPYIP PDDPPFDPSS LSVVQPSSSD DLRRIDSSLL PPPTWWPPPS

             PPDAAAL--- RRRREEVVGG IIR------H LMSGGAIEEG AAASSSLLDS HHHAAAALAA
             PPLSSPTHEE TTKKPETTDD SSEDDDDFPK IYD--RISSS NNASSSLLQI RRRESSSVSS
             PPLSSPTHEE TTKKPETTDD SSEDDDDFPK IYD--RISSS NNASSSLLQI RRRESSSVSS

             VSAAAASSGG VAVHTTAASS FP-SVVAPPT TTDAEHFLLL -YHFFFYEEP YLFHTAQALE
             LGDPPPTTEE VAFYTTAASS SPNSAATSSS SSSSTELIII LYKLLLNDDP YSFHTAQALE
             LGDPPPTTEE VAFYTTAASS SPNSAATSSS SSSSTELIII LYKLLLNDDP YSFHTAQALE

             AFGCDHDDDF SMGLQQWWAA LQAALLLLAL LLRRRPPPFF FLTTGGPPPT TTGGRRE---
             ATKSNHDDDF GVGIQQWWAA LQAALLLLAT TTRRRKKKTT TISSGGPLLG GGEESSESSS
             ATKSNHDDDF GVGIQQWWAA LQAALLLLAT TTRRRKKKTT TISSGGPLLG GGEESSESSS

             LDDDVVGRLL ADARVRSFVV AASLEEVVVR PWLIGEEAFN NNSSVLLQLL RRLLGDPAAA
             LAAATTGRLL RDAKLDDFII LTPILLLLLN GSFVDEEAVN NNFFMLLQLL KKLL-----T
             LAAATTGRLL RDAKLDDFII LTPILLLLLN GSFVDEEAVN NNFFMLLQLL KKLL-----T

             P--IIIALLL LLDVVVASVV VVRRPPPKFT VIEQEHKTTF RRTELSSSSS AFSSSDDAGG
             PIIVVVTLLL LLRAAAKSLL LLNNPPPRVT LGEYELRVVF RRKNLSSSSS AFSSSEEPDS
             PIIVVVTLLL LLRAAAKSLL LLNNPPPRVT LGEYELRVVF RRKNLSSSSS AFSSSEEPDS

             GAAGMAA-YY LLREEDVVVV GEGAAAAAAR EEEERHHEPL SSRRWWRRRD RRRRLTTRRG
             SEEERVVRLL FFRRRGIIII PEKTTTGGGR EEEERMMEEK EEQQWWRRRV LLLLMEENNG
             SEEERVVRLL FFRRRGIIII PEKTTTGGGR EEEERMMEEK EEQQWWRRRV LLLLMEENNG

             GLSSVVPPLL LSSSNNNNNA ARRRAMLLVV GLFFGGGGG- -HHSEGGGCT TGGSSEEEGG
             GFEEVVKKLL LNNNYYYYYA ASSSAILLLL WNYYYYNNNY YSSISGGGFS SDDSSRRR--
             GFEEVVKKLL LNNNYYYYYA ASSSAILLLL WNYYYYNNNY YSSISGGGFS SDDSSRRR--

             DGGDNNNSGS DSSSNGGSGK KSSAAAAARD DGGSSSCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- MDDTFPQQQ- ----WWPMAA AAAASLLLAG
t5g66770.1   MAYMCCCDSS GNMMIIKQQE QQQQQQHHQQ HQQIFGNNNS SLPPWWP-SL LLLLFLLLGS
t5g66770.2   MAYMCCCDSS GNMMIIKQQE QQQQQQHHQQ HQQIFGNNNS SLPPWWP-SL LLLLFLLLGS

             GFPPPAA--- ---------- ------APGV G--------- ---------- ----------
             SADPPQQTGG GGGDDNGPPP LDHHHHATGG GRRRLLDGGG GGGGGESEEW TIISSGGGDG
             SAPPPQQTGG GGGDDNGPPP LDHHHHATGG GRRRLLDGGG GGGGGESEEW TIISSGGGDG

             -------YDA ------GD-- ---------- -----VVAAA EAAAFPPPCA PPAAAAVVL-
             PPPCTWWHDD YYYYVYGDFF DTYSSRRLSS VVPDVIISSS PPPLWPPPSS PPPPPPLLTH
             PPPCTWWHDD YYYYVYGDFF DTYSSRRLSS VVPDVIISSS PPPLWPPPSS PPPPPPLLTH

             MMREEVGG-- ---------V HLMSCCCCCG GAAAIIIDHH AALAQQLADD SSHHHAAAAL
             PPKEDTDDDD DDLEPPPPPL KIYDCCCCC- -RRRIIIDPP NNEATTLLQQ IIRRREESSV
             PPKEDTDDDD DDLEPPPPPL KIYDCCCCC- -RRRIIIDPP NNEATTLLQQ IIRRREESSV

             LAAAVSSAAA SGGIVVAATL SRRRFFPPVP PPPTEHAAF- --YYHHFFYY AAAACCCYHH
             VSSSLGGDDD T---VVAAEL SNNRSSPPAS SSSSTEDDLL LLYYKTLLNN AAAACCCYHH
             VSSSLGGDDD T---VVAAEL SNNRSSPPAS SSSSTEDDLL LLYYKTLLNN AAAACCCYHH

             AQQILLEAAF FFGCDHHVVI FFFFSMQGGQ QQWWPIIQQQ LRPPGGGPP- LIGGPPPSSS
             AQQILLEAAT TTKSNHHIIV FFFFGVQGGQ QQWWPLLQQQ LRTTGGGKPQ IIPPAAPSSS
             AQQILLEAAT TTKSNHHIIV FFFFGVQGGQ QQWWPLLQQQ LRTTGGGKPQ IIPPAAPSSS

             PRE--LLDDV VGLAADDLLL SSVVRRSFAA NSSLVRPPWW MMLLLIGGEA VAFFFNVVLL
             LSEPSLLAAT TGLRRDDFFF VVLLNNDFTT -PPILNGGSS SSFFFVDDEV LAVVVNMMLL
             LSEPSLLAAT TGLRRDDFFF VVLLNNDFTT -PPILNGGSS SSFFFVDDEV LAVVVNMMLL

             HHRLGGDPPA ADQQQAAP-- --DADCCVAV RPKIFFTTTE EEQEDKKGFL DFFFEAYYYS
             YYKL------ -DEEETTPTI IIDTRLLAKL NPRVVVTTTE EEYESRRGFA NVVVNAFFYS
             YYKL------ -DEEETTPTI IIDTRLLAKL NPRVVVTTTE EEYESRRGFA NVVVNAFFYS

             AAAVFDDSDA ASAAGGGGAN AMALRRCVVG --RRRRHPPP SRRWWRLLTR GGLSPSSLLR
             AAAVFEESEP PLGGDDSSER VREFRRSIIK IIHRRRMEEE EQQWWRMMEN GGFEKNNVVS
             AAAVFEESEP PLGGDDSSER VREFRRSIIK IIHRRRMEEE EQQWWRMMEN GGFEKNNVVS

             QARMLLVVFF SSGGG---HE GCCCLLLGWW HHGGPPFAAA ASAEEEAAAA GGGGDDDGDN
             QAKILLLLYY NNYNNLLLSE GFFFILLAWW NNDDPPLLLL LSSRRR---- ----------
             QAKILLLLYY NNYNNLLLSE GFFFILLAWW NNDDPPLLLL LSSRRR---- ----------

             NNNNNNNNNN NVVGSSSSDD SNNSGGGNGG GAAARSSVL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --TTQ--WDP PPAAAASSSL LDAALPPPPA
t5g66770.1   MAYMCCSGGG MAAAAQQQQV QKQEQQQQQD DDIINLLWNT TTSSSLGGGL LSGGFDDDPF
t5g66770.2   MAYMCCSGGG MAAAAQQQQV KKQEQQQQQD DDIINLLWNT TTSSSLGGGL LSGGFPPPPF

             A--------- ---------- AAAAPPPDGV ---------- ---------- ----------
             FGGGGGDDSD DPPGFPFNHH AAAATTTTGG RSDFFGGGGT TGGGEFESSD EEMMMTLISS
             FGGGGGDDSD DPPGFPFNHH AAAATTTTGG RSDFFGGGGT TGGGEFESSD EEMMMTLISS

             ---------- ------YYDP A-----GDD- ---------- ---------V DAEAAAAAFF
             DDADDGPDCC CWWWWWHHDP DVVIIYGDDP PFTTYPSSRR RRLPSLLRVI DSPTTTTLWW
             DDADDGPDCC CWWWWWHHDP DVVIIYGDDP PFTTYPSSRR RRLPSLLRVI DSPTTTTLWW

             CCPAAAAAVL ---MREEEEG GIR------- LLLHLLLLLA GGGAGLLLAA SAALADDHAA
             SSPSIPPPLT HEEPTDDPED DSEDDDDDDD LLLKAAAIIA ---DSEEEAA SKKLLQQRES
             SSPSIPPPLT HEEPTDDPED DSEDDDDDDD LLLKAAAIIA ---DSEEEAA SKKLLQQRES

             AAAVSSGGGI GRRVAAVHHT TALLSSRRRL LF-SSSPPPV VAPPPPTTTT DDAAHFF--H
             SSSLGT---- ERRVAAFYYT TALLSSNRRL LSNSSSPPPA ATSSSSSSSS SSSSELLLST
             SSSLGT---- ERRVAAFYYT TALLSSNRRL LSNSSSPPPA ATSSSSSSSS SSSSELLLST

             EEAACCCYYK KAAHHHFFTT AQQLEEAAFG CDDVHVVIDF LMQGGLLQQQ WWQQLLLLRR
             DDAACCCYYK KAAHHHLLTT AQQLEEAATK SNNIHIIVDF IVQGGIIQQQ WWQQLLTTRR
             DDAACCCYYK KAAHHHLLTT AQQLEEAATK SNNIHIIVDF IVQGGIIQQQ WWQQLLTTRR

             PPPPPP--RG GIGTGE--RR DVVGLLLLAA RVVVVVRFFF FGGGAANLML QQQAAPGEAF
             TTKKKPQQRG GIPGEEPSII ATTGLLLLRR KLLLLLNFFF FPPPLL-ISF RRRDDPDEVV
             TTKKKPQQRG GIPGEEPSII ATTGLLLLRR KLLLLLNFFF FPPPLL-ISF RRRDDPDEVV

             FFVLLLLLRL LGAAADDDPI DDACCVAKKI IFFTTIIQEA DHKFFLLLDD RRTTAAAAYY
             VVMLLLLLKL L----DDDPV DDTLLAKRRV VVVTTGGYEV SLRFFAAANN RRKKAAAAYY
             VVMLLLLLKL L----DDDPV DDTLLAKRRV VVVTTGGYEV SLRFFAAANN RRKKAAAAYY

             VVFDSLLAAS ASSGGAGGNA AAAMEAAYYL IICCCDIVCC GGEGAARRHE SRRRWWWDDD
             VVFESLLPPL GRRDSEEERV VVVREEELLF IISSSGLIGG PPEKTTHRME EQQQWWWVVV
             VVFESLLPPL GRRDSEEERV VVVREEELLF IISSSGLIGG PPEKTTHRME EQQQWWWVVV

             RALLSAAVVP LSNNNAALRQ QAAMMLVVVG LGGEE-HVEE EDDDLTGGGW PLFFWEAGGG
             NAFFESSVVK LNYYYAAVSQ QAAIILLLLW NYYSSYSVEE SPPPISAAAW PLLLWR----
             NAFFESSVVK LNYYYAAVSQ QAAIILLLLW NYYSSYSVEE SPPPISAAAW PLLLWR----

             GGGGGGGNNN SSNVVSSSSD NSSSSNNNNS RRRDGVVCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- MFFPPPQ--- ---WPPMMMP AASSGDAAAL
t5g66770.1   AATTGGMIQQ QQQVIKQQQE QQQHHHQDDD HFFGGGNPLL SLLWPP---T SLFFGSGGGF
t5g66770.2   AATTGGMIQQ QQQVIKKQQE QQQHHHQDDD HFFGGGNPLL SLLWPP---T SLFFGSGGGF

             PPAAAVVV-- ---------- ------PPPG GGGGY----- ---------- ----------
             PDFQQVVVTG GGGGDPGGFP PLLLHHTTTG GGGGFRDDDG GGGFESEEEM TLISSSVDDP
             PPFQQVVVTG GGGGDPGGFP PLLLHHTTTG GGGGFRDDDG GGGFESEEEM TLISSSVDDP

             -----YYDDD DPPPPAA--G ADD------- ---------- ----AALLPF FAAFPPAAAA
             DCCCDHHDDD DNNPPDDVYG PDDPFFYYSS LVQQDDDDDL NVVVSSPPLP PLLWPPSSII
             DCCCDHHDDD DNNPPDDVYG PDDPFFYYSS LVQQDDDDDL NVVVSSPPLP PLLWPPSSII

             VL--AAMREE EEEAIR---- ----LLVMMS CCCCCAEGGG DHHAALLASS AQQQQLADDS
             LTHHSSPKED PEENSEDDDF DDLPLLLYYD CCCCCRSSSS DPPNNEEASS KTTTTLLQQI
             LTHHSSPKED PEENSEDDDF DDLPLLLYYD CCCCCRSSSS DPPNNEEASS KTTTTLLQQI

             SALAAAVAAR VVAAAAVFTT SRRRLFFP-- VAAPPTTTAA L---YYHHHH YYAACPPPPY
             ISVSSELDDR VVAAAAFFEE SNNRLSSPNN ATTSSSSSSS ILLSYYKTTT NNAACPPPPY
             ISVSSELDDR VVAAAAFFEE SNNRLSSPNN ATTSSSSSSS ILLSYYKTTT NNAACPPPPY

             YLKKFFFNQA LLFFGCCCHV VHVIDMMQQL QWPLIQALAL RRPPPPF--L RIITGGIGPP
             YSKKFFLNQA LLTTKSSSKI IHIVDVVQQI QWPLLQALAT RRTKKPTQQI RVVSGGIPAA
             YSKKFFLNQA LLTTKSSSKI IHIVDVVQQI QWPLLQALAT RRTKKPTQQI RVVSGGIPAA

             SSRRD--RRG RLLLLADDAR RSSVVRFFFS SFRRVANSSD VRPPPPWMLQ IIVVAAFFSS
             SSSSPPSIIG RLLLLRDDAK KVVLLNFFFD DFIIIT-PPH LNGGGGSSFR VVLLAAVVFF
             SSSSPPSIIG RLLLLRDDAK KVVLLNFFFD DFIIIT-PPH LNGGGGSSFR VVLLAAVVFF

             LHRLLLGADD QP---IDAVV DDCCSRPKII TVVIIEHHHN KTTTGFLLDR FTTELYYSSA
             LYKLLL--DD EPTTIVDTAA RRLLSNPRVV TLLGGELLLN RVVVGFAANR VKKNLFYSSA
             LYKLLL--DD EPTTIVDTAA RRLLSNPRVV TLLGGELLLN RVVVGFAANR VKKNLFYSSA

             VLDAASSSAA GGAAAAAGNA AMAAEAAYLQ CDCEGAAARR RRRRRRRHPL SRRDDRTTTR
             VLENNLLLGG DDEEEEEERV VRVVEEELFG SGGEKGGGHH RRRRRRRMEK EQRVVLEEEN
             VLENNLLLGG DDEEEEEERV VRVVEEELFG SGGEKGGGHH RRRRRRRMEK EQRVVLEEEN

             AALLSPGSSN NQAMLLLLFF GG----HSEE ADDCCLLTTG WHGRRPPLSS SSASAWWWEA
             AAFFEKSNNY YQAINNNNYY YNLLLYSIES KPPFFIISSA WNDLLPPLTT TTLSSWWWR-
             AAFFEKSNNY YQAINNNNYY YNLLLYSIES KPPFFIISSA WNDLLPPLTT TTLSSWWWR-

             ADGGNNNNNS SSSSVSSSSD SNNSSSSNKS SGRDSCCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----MDDDPQ ---------W WPMMMDDAAS
t5g66770.1   AAYDDDNAAI AQQVIIKQKK QQQQQQQQQQ HQQDHQQQGN LLLLNNNPPW WP---NNLLG
t5g66770.2   AAYDDDNAAI AQQVIIKKKK QQQQQQQQQQ HQQDHQQQGN LLLLNNNPPW WP---NNLLG

             GAGFFLLPPP PAVVV----- ---------- --AAPDGGGG YY-------- ----------
             GGSAAFFPPD PQVVVTGGDD DGFFPPNNHH HHAATTGGGG FFRLLSDGGG GGGESSSDEE
             GGSAAFFPPP PQVVVTGGDD DGFFPPNNHH HHAATTGGGG FFRLLSDGGG GGGESSSDEE

             ---------- ---------- -DDDP----- GGD------- ---------- DAAAEEFAAA
             WTTIGDSVVA GPDDDDCCCT WDDDNYYYYI GGDPPFTYYY SRLPDLNNNR DTSSPPPPTT
             WTTIGDSVVA GPDDDDCCCT WDDDNYYYYI GGDPPFTYYY SRLPDLNNNR DTSSPPPPTT

             AFPAAAAAAA AVLL----AA REEGGGIIRR ---------- HHMSSCCAGG AAIEEAGDDH
             LWPSSIIIPP PLTTHEEESS TDPDDDSSEE DDDFDLEPPP KKYDDCCA-- RRISSDSDDP
             LWPSSIIIPP PLTTHEEESS TDPDDDSSEE DDDFDLEPPP KKYDDCCA-- RRISSDSDDP

             AASADDSSHH AAAAAASSAS IGGVVAVVHH TTAALRLFFF ---SPVAAAA PTDAAFFFLH
             NNSLQQIIRR SSSSSSGGDT -EEVVAFFYY TEAALRLSSS NNNSPATTTT SSSSDLLLIK
             NNSLQQIIRR SSSSSSGGDT -EEVVAFFYY TEAALRLSSS NNNSPATTTT SSSSDLLLIK

             HHFEAYYYLL KFFHFTTAQI LLLEEEAFFH GGGHHIDDDS SLMGGGLLIA LAALLRPPGG
             KTLDAYYYSS KFFHLTTAQI LLLEEEATTE KKKKKVDDDG GIVGGGILLA LAATTRTTSS
             KTLDAYYYSS KFFHLTTAQI LLLEEEATTE KKKKKVDDDG GIVGGGILLA LAATTRTTSS

             GGPP-RTGII PPPPTTTGEE --RVVVLLRR AAAVVRRRRF SFFRVAAANN SDDEEVVVRP
             GGKKQRSGII APLLGGGEEE PSITTTNNRR RRALLDDNNF DFFIILTT-- PHHLLLLLNG
             GGKKQRSGII APLLGGGEEE PSITTTNNRR RRALLDDNNF DFFIILTT-- PHHLLLLLNG

             PWWMIPAVVF SQQQLHHLLL GPPQP---II DDASRPPKII ITIIEEQAAD DNNTTTDDRR
             GSSSVPVLLV FQQQLYYLLL ---EPTIIVV DDKSNPPRVV VTGGEEYVVS SNNVVVNNRR
             GSSSVPVLLV FQQQLYYLLL ---EPTIIVV DDKSNPPRVV VTGGEEYVVS SNNVVVNNRR

             RTAYSSAVSA AAASSSGGGG GAMME-AAYY QREIIDICEG GRERHPLWRR RDDRLRRGLL
             RKAFSSAVSP NNNLLRDSEE EVRREREELL GRRIIGLGEK KRERMEKWRR RVVLMNNGFF
             RKAFSSAVSP NNNLLRDSEE EVRREREELL GRRIIGLGEK KRERMEKWRR RVVLMNNGFF

             LSSAVVVVPS NNLLLRRARM VVGGLFFSS- HHHSVAAGCL LLLGGPFFAS AEAAAAGDGG
             FEESVVVVKN YYVVVSSAKI LLWWNYYNNL SSSIVKKGFI ILLDDPLLLS SR--------
             FEESVVVVKN YYVVVSSAKI LLWWNYYNNL SSSIVKKGFI ILLDDPLLLS SR--------

             GNNNNSSNNN GSSDSSGSSN GGKSSGGGAR DDSSSVVVV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- MTFFFFF--- --WPMAASLD
t5g66770.1   YYYCCCTDSG GMMAAAQQVI IKKKKQEEQQ QQQQQQHDDD HIFFFIILSN PPWP-SLGLS
t5g66770.2   YYYCCCTDSG GMMAAAQQVI IKKKKQEEQQ QQQQQQHDDD HIFFFIILSN PPWP-SLGLS

             AFLLLPPPAA ---------- ---------- --APPDDGGV G--------- ----------
             GAFFFPDPFQ GGGDSNPFPF NNNNDHHHHH HHATTTTGGG GSDDFFGGGG FFEDDDWMII
             GAFFFPPPFQ GGGDSNPFPF NNNNDHHHHH HHATTTTGGG GSDDFFGGGG FFEDDDWMII

             ---------- ---YYY---- GGA------- -------VVA ALLPPEAAFP CAAAPDAAAA
             IGGSVGGGCC TWWHHHYVII GGPFDSSVQQ PSDLLNRIIS SPPLLPPPWP SSSSPLIIPP
             IGGSVGGGCC TWWHHHYVII GGPFDSSVQQ PSDLLNRIIS SPPLLPPPWP SSSSPLIIPP

             LLLL-MMMRR EEGIIR---- ---------- ----HHHHLL LMSSAAAGAE EAAALLASSA
             TTTTHPPPTT DPDSSEDDDD FFDDDLLLEE EPPPKKKKAA AYDDAAA-RS SDNNEEASSK
             TTTTHPPPTT DPDSSEDDDD FFDDDLLLEE EPPPKKKKAA AYDDAAA-RS SDNNEEASSK

             AQAHHAASAA ASIGVAAVFF TTAALSSRLS PVVPPPTTTA AL--HHFFYE EEAPYLLKKF
             KTLRRSEGPP PT-EVAAFFF TTAALSSNLS PAASSSSSSD DILSKKLLND DDAPYSSKKF
             KTLRRSEGPP PT-EVAAFFF TTAALSSNLS PAASSSSSSD DILSKKLLND DDAPYSSKKF

             HHHHFTAQQL HGGCDHVVHV IDLQGQQQQQ PIIQLARRRR PPRRIITTGP PSSPGDDE--
             HHHHLTAQQL EKKSNKIIHI VDIQGQQQQQ PLLQLARRRR TPRRVVSSGA PSSLEPPEPP
             HHHHLTAQQL EKKSNKIIHI VDIQGQQQQQ PLLQLARRRR TPRRVVSSGA PSSLEPPEPP

             LLRDDLLLAA DDDLLLASSV VRVVRSSSFF FRGGVAANLE ERLLLQQAPP GAVAFFSLHL
             LLIAANNLRR DDDFFFAVVL LDLLNDDDFF FIPPILL-IL LNFFFRRDPP DVLAVVFLYL
             LLIAANNLRR DDDFFFAVVL LDLLNDDDFF FIPPILL-IL LNFFFRRDPP DVLAVVFLYL

             GGPDDDQPAA LLDCCAVRPP KKFFFTIEEQ EAADHNNNGF LLDRFFEAFS SDSDASSSAA
             ---DDDEPTT LLRLLKLNPP RRVVVTGEEY EVVSLNNNGF AANRVVNAQS SESENLLLGG
             ---DDDEPTT LLRLLKLNPP RRVVVTGEEY EVVSLNNNGF AANRVVNAQS SESENLLLGG

             SSGGAMMEEE E--AAAQQRR RRREEIDIVC GEAA--RREH HLLSRRRRDR RRLLTTGLLS
             RRSEVRREEE ERREEEGGRR RRRRRIGLIG PETTIIHHEM MKKEQQQRVL LLMMEEGFFE
             RRSEVRREEE ERREEEGGRR RRRRRIGLIG PETTIIHHEM MKKEQQQRVL LLMMEEGFFE

             SSVPPLGGSS NQQRVGGFSS GEEESSVVVE EEADGGWHRR PPLLLFAASA WWWWEEADGG
             EEVKKLSSNN YQQKLWWYNN YSSSIIVVVE EEKPAAWNLL PPLLLLLLSS WWWWRR----
             EEVKKLSSNN YQQKLWWYNN YSSSIIVVVE EEKPAAWNLL PPLLLLLLSS WWWWRR----

             GGDDNNNNSS SNNVGSSGGD SSSGGSSNKS SGRRDSVLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------D FFFFFQ---- WPDDDPAASS
t5g66770.1   MAMCCTTTTD SGGNNNLLMA IIAAQVIKQQ QQQQQQHHHQ FIIIINPPLN WPNNNTLLGG
t5g66770.2   MAMCCTTTTD SGGNNNLLMA IIAAQVIKQQ QQQQQQHHHQ FIIIINPPLN WPNNNTLLGG

             DDDDDAGFPP PPPAVVVV-- ---------- ------PDGY ---------- ----------
             SSSSSGSAPP DDPFVVVVTT GGNDDDPFPP LDDHHHTTGF RLLLFFGGGG EEDMEETTLS
             SSSSSGSAPP PPPFVVVVTT GGNDDDPFPP LDDHHHTTGF RLLLFFGGGG EEDMEETTLS

             ---------- ----YYAA-- G--------- ---------- --VAAAAPPE EFAAAAAAAA
             SGGVVDDDPP PCWWHHDDIY GPTSSRLVVQ QPDLLNRRRV VVISSSSLLP PPPPTTTLLL
             SGGVVDDDPP PCWWHHDDIY GPTSSRLVVQ QPDLLNRRRV VVISSSSLLP PPPPTTTLLL

             FPPCCAAPDA AAAAAV-RRR EEEEEEEEVR ---------L VHHLMSCIDD SLLDDSHAAV
             WPPSSSSPLI IPPPPLHTTK EEEEPPPETE DFDDDLLPPL LKKIYDCIDD SLLQQIRESL
             WPPSSSSPLI IPPPPLHTTK EEEEPPPETE DFDDDLLPPL LKKIYDCIDD SLLQQIRESL

             VSAIIGGGRR VAAAHFFTTT AALRRLF-SV PTTAAEEHAA FLL--YHHHF EAYYLLKFHH
             LGD--EEERR VAAAYFFEEE AALNNLSNSA SSSSSTTEDD LIILLYKKKL DAYYSSKFHH
             LGD--EEERR VAAAYFFEEE AALNNLSNSA SSSSSTTEDD LIILLYKKKL DAYYSSKFHH

             FFTTAQQAIL FHHGDHHHVI IIIDFMLQWW ALIIQALRPG GPPF-RIIIG GIIIGPPPPP
             LLTTAQQAIL TEEKNKHHIV VVVDFVIQWW ALLLQALRTG GKKTQRVVVG GIIIPAPPPL
             LLTTAQQAIL TEEKNKHHIV VVVDFVIQWW ALLLQALRTG GKKTQRVVVG GIIIPAPPPL

             PTGDDDE--V GLLAADLASV VRRVVVRFFF GGVVVVAALD VVWWWWLIIE VVAFNLQQHH
             LGEPPPEPPT GNNRRDFAVL LDDLLLNFFF PPIIIILTIH LLSSSSFVVE LLAVNLQQYY
             LGEPPPEPPT GNNRRDFAVL LDDLLLNFFF PPIIIILTIH LLSSSSFVVE LLAVNLQQYY

             LLGDDQAAP- --IVLVAAAS SVRFTEQAAD NNTFFLLDDF ALLFFAAVVD DDSSLLLDAA
             LL--DETTPT IIVALAKKKS SLNVTEYVVS NNVFFAANNV ALLQQAAVVE EESSLLLEPP
             LL--DETTPT IIVALAKKKS SLNVTEYVVS NNVFFAANNV ALLQQAAVVE EESSLLLEPP

             AASAAGAGNN NMM-YQQREI IGGEGGGGAA A-RRRPPPLS SRWRRRRLLR LSAVPLLGSA
             NNLGGSEERR RRRRLGGRRI LPPEKKKKTG GIHRREEEKE EQWRRRLMMN FESVKLLSNA
             NNLGGSEERR RRRRLGGRRI LPPEKKKKTG GIHRREEEKE EQWRRRLMMN FESVKLLSNA

             AARVVVLFGG G---HHHHSS SVEEDDDDGC TTTGRRPPPL FSSSASAAGG GDDNNNNNSS
             AAKLLLNYYN NLLYSSSSII IVSSPPPPGF SSSDLLPPPL LTTTLS---- ----------
             AAKLLLNYYN NLLYSSSSII IVSSPPPPGF SSSDLLPPPL LTTTLS---- ----------

             NNNNVVSSSS SSGDDDSSSS SSNNGKSGGR RDSSSSCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -MDT------ WWPPMMDDPP
t5g66770.1   MYTDSSNLMM IQQQQQVVVK KKKKQQQQQE QQQQQQQQHH QHQILSLLNN WWPP--NNTT
t5g66770.2   MYTDSSNLMM IQQQQQVVVK KKKKQQQQQE QQQQQQQQHH QHQILSLLNN WWPP--NNTT

             PAAAAAGLLD GGFFLLPPPP AAAAA----- ---------- ---APPPDDG GY--------
             TSSSLLGLLS SSAAFFPPDP FFFQQTTGGS NNNGFPLLLD HHHATTTTTG GFRLSDGGTG
             TSSSLLGLLS SSAAFFPPPP FFFQQTTGGS NNNGFPLLLD HHHATTTTTG GFRLSDGGTG

             ---------- ---------- -----DPAAA ---AAD---- ---------- -----VVDAA
             GGGEEDEMEI DSSSVADDPD DDDTWDNDDD YVIPPDPFFD YPPPLLSVQQ PLNNRIIDTS
             GGGEEDEMEI DSSSVADDPD DDDTWDNDDD YVIPPDPFFD YPPPLLSVQQ PLNNRIIDTS

             PPFFAPPPCP DAAAAVLL-- --MMREEVII IRRR------ ---LVHLLLM MSSCAIIIIE
             LLPPTPPPSP LSPPPLTTHE EEPPKEDTSS SEEEDDDFLL LEPLLKAAAY YDDCAIIIIS
             LLPPTPPPSP LSPPPLTTHE EEPPKEDTSS SEEEDDDFLL LEPLLKAAAY YDDCAIIIIS

             EGGGDALSSS QLLDDSSAAV SASSIVAAVH HFFTLLLSRR SSVPTTTTDD AAF-HHHYAA
             SSSSDNESSS TLLQQIIEEL GPTT-VAAFY YFFELLLSNN SSASSSSSSS SDLSKKTNAA
             SSSSDNESSS TLLQQIIEEL GPTT-VAAFY YFFELLLSNN SSASSSSSSS SDLSKKTNAA

             PFAAANQAII LLEAAFHHCH VHHVVIIDFL MQGQQWPPAL LALLLLRRRP GGGPPPPFFR
             PFAAANQAII LLEAATEESK IHHIIVVDFI VQGQQWPPAL LATTTTRRRT SSGPPPPTTR
             PFAAANQAII LLEAATEESK IHHIIVVDFI VQGQQWPPAL LATTTTRRRT SSGPPPPTTR

             ITGIIGPPPP PPPTGGRRDE -LRDDDVGGL RAADLLLAAS VVRRFSSSFA AANSSSSDEE
             VSGIIPAAAP PPLGEESSPE PLIAAATGGN RRRDFFFAAV LLNNFDDDFL LT-PPPPHLL
             VSGIIPAAAP PPLGEESSPE PLIAAATGGN RRRDFFFAAV LLNNFDDDFL LT-PPPPHLL

             RPWWWMMQQQ IPSSVLLLQH HGGGDDDPPD QAPP---DAA VVLDDCAAAR RRIFIEQADD
             NGSSSSSRRR VPFFMLLLQY Y--------D ETPPTIIDTT AALRRLKKKN NNVVGEYVSS
             NGSSSSSRRR VPFFMLLLQY Y--------D ETPPTIIDTT AALRRLKKKN NNVVGEYVSS

             HNNKTTGLLD FTTLYYSAAV FFDSAASSSA ANAMMAEE-Y LLLQEIICDC GGEGGAAERR
             LNNRVVGAAN VKKLYYSAAV FFESPPLLLG GRVRRVEERL FFFGRIISGG PPEKKTTERR
             LNNRVVGAAN VKKLYYSAAV FFESPPLLLG GRVRRVEERL FFFGRIISGG PPEKKTTERR

             EPLLRRRRRG GGLSSAVVPP LGGNNRAAAM GLSSG----- HHEEGCLLTT GHHRRPLSSA
             EEKKQRRRNG GGFEESVVKK LSSYYSAAAI WNNNYLLYYY SSEEGFIISS ANNLLPLTTL
             EEKKQRRRNG GGFEESVVKK LSSYYSAAAI WNNNYLLYYY SSEEGFIISS ANNLLPLTTL

             SAGGDGGGGG DDNNSSNSGG SSSNNGGGSN SSSRSVVVC
             S--------- ---------- ---------- ---------
             S--------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---MTPP--- -WMMAAAASS SGGGLLDGLP
t5g66770.1   AACSGLLMMA AIQKKQQQQK QQEQQQQQHQ DDDHIGGPSL NW--SSSLGG FGGGLLSSFP
t5g66770.2   AACSGLLMMA AIQKKKKKKK QQEQQQQQHQ DDDHIGGPSL NW--SSSLGG FGGGLLSSFP

             PPAAAAV--- ---------- ---------- -AADGVGY-- ---------- ----------
             PDQQQQVTGG GGGGGGSSNP PNLHHHHHHH HAATGGGFLS DFGGGTGFFF FEESSSDWME
             PPQQQQVTGG GGGGGGSSNP PNLHHHHHHH HAATGGGFLS DFGGGTGFFF FEESSSDWME

             ---------- ---------- -------D-- ----GD---- ---------- -DDAAAAEAA
             EELSGGDDSS SSSVAADDGG DCDTTTWDYY VVVIGDPPPT PSRLLSQPNR VDDSSSSPPT
             EELSGGDDSS SSSVAADDGG DCDTTTWDYY VVVIGDPPPT PSRLLSQPNR VDDSSSSPPT

             AFPPPDALL- -REEVGIR-- -----HSSCC AAGIEAGDHH HALAAAQQAA DSSSHHLLLA
             TWPPPLSTTH EKDDTDSEDD FDLEPKDDCC AA-ISDSDPP PNEAKKTTLL QIIIRRVVVS
             TWPPPLSTTH EKDDTDSEDD FDLEPKDDCC AA-ISDSDPP PNEAKKTTLL QIIIRRVVVS

             AAVSAAAGIR VVVTTTTAAL LSRRRLF-SS PAPPDAEHAA FFLLL-YHHY YEEAPPYLLK
             EELGPPP--R FFFTTTEAAL LSNRRLSNSS PTSSSSTEDD LLIIISYKTN NDDAPPYSSK
             EELGPPP--R FFFTTTEAAL LSNRRLSNSS PTSSSSTEDD LLIIISYKTN NDDAPPYSSK

             KFAHHFFTAQ AAHCDHHHVV IIDDDFFMQQ LQWPAAIIIQ LLLALRRGGG GLRTGGGGPP
             KFAHHLLTAQ AAESNKKKII VVDDDFFVQQ IQWPAALLLQ LLLATRRSSG GIRSGGPPAP
             KFAHHLLTAQ AAESNKKKII VVDDDFFVQQ IQWPAALLLQ LLLATRRSSG GIRSGGPPAP

             PPTGDDDEE- ---RVVVLRR RAAAASSVVV RRRRGGGGAA AAAANSSLEE RRPLLQIIIA
             PLGEPPPEEP PPPITTTNRR RRRAAVVLLL NIIIPPPPLL LTTT-PPILL NNGFFRVVVD
             PLGEPPPEEP PPPITTTNRR RRRAAVVLLL NIIIPPPPLL LTTT-PPILL NNGFFRVVVD

             AAEAAAVAFF NNSVLLLLLR LPPPAAQP-- IIIDDVVLDC VVARPPKITT TVVEEADHKT
             DDEVVVLAVV NNFMLLLLLK L-----EPTI VVVDDAALRL AAKNPPRVTT TLLEEVSLRV
             DDEVVVLAVV NNFMLLLLLK L-----EPTI VVVDDAALRL AAKNPPRVTT TLLEEVSLRV

             FFLDFTLFFY YYYFDSLLDD DAAASGNNAA -QQRREEIDD VVCEA-ERHR WDDRRRRAAG
             FFANVKLQQF FFYFESLLEE EPNNLDRRVV RGGRRRRIGG IIGEGIERMQ WVVLNNNAAG
             FFANVKLQQF FFYFESLLEE EPNNLDRRVV RGGRRRRIGG IIGEGIERMQ WVVLNNNAAG

             GSAVLNALAA RMLLLFSSEG ----SEAADD CCLLLTTLLG WGRPPLLSAA AWWEAGGGDD
             GESVLYAVAA KILLNYNNSN LLLLISKKPP FFIIISSLLA WDLPPLLTLL SWWR------
             GESVLYAVAA KILLNYNNSN LLLLISKKPP FFIIISSLLA WDLPPLLTLL SWWR------

             DSSSNSSSSG SSGSGSSGGS SSSSSARRGG GGGSSCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- -----DDTFF FFF------W WPMDDDDDDP SSSSGGAGGF
t5g66770.1   MMCCCTDNAA AIIAQVQQQQ QQQHDQQIFF FFISSLNNPW WP-NNNNNNT GFFFGGGSSA
t5g66770.2   MMCCCTDNAA AIIAQVQQQQ QQQHDQQIFF FFISSLNNPW WP-NNNNNNT GFFFGGGSSA

             FFPA------ ---------- APDDGVGY-- ---------- ---------- ----------
             AAPFTTTSDP FFFFLDDHHH ATTTGGGFRR SDDFGGTTGG EEFFSDWWWE ETIISGSSVA
             AAPFTTTSDP FFFFLDDHHH ATTTGGGFRR SDDFGGTTGG EEFFSDWWWE ETIISGSSVA

             ---------- -YPP---D-- ---------- ---------- ---VVVDAAA LLFAAAAAAC
             DDDDGPDDTT WHPPYVIDPP FDTTYSRLSS VVQQQPPSDL LLNIIIDTSS PPPPPTTLLS
             DDDDGPDDTT WHPPYVIDPP FDTTYSRLSS VVQQQPPSDL LLNIIIDTSS PPPPPTTLLS

             CDAAAAAAV- --AMRREEEV VGGGIRR--- -------HLL MSEEAAAAAL LLLADDAAAA
             SLSSIIIPLH HESPTKDPPT TDDDSEEDDD DFDDDDPKII YDSSDDNAAL LLLLQQESEE
             SLSSIIIPLH HESPTKDPPT TDDDSEEDDD DFDDDDPKII YDSSDDNAAL LLLLQQESEE

             VSSAAGIIGG VAAVVHFTAL RLLFSPAPPT TTTEEEA--H FACPPYLLKF ATTANQIILL
             LGGDP---EE VAAFFYFEAL RLLSSPTSSS SSSTTTDLSK LACPPYSSKF ATTANQIILL
             LGGDP---EE VAAFFYFEAL RLLSSPTSSS SSSTTTDLSK LACPPYSSKF ATTANQIILL

             LAGHHVVVII IFSLLMLQAA ALLLPPPGGG PPPF-LGIPP PPPSPRDDDE EE---RRGGG
             LAKKKIIIVV VFGIIVIQAA ALLTTTTSGG KPPTQIGIAA APPSLSPPPE EEPPSIIGGG
             LAKKKIIIVV VFGIIVIQAA ALLTTTTSGG KPPTQIGIAA APPSLSPPPE EEPPSIIGGG

             LLRLDDLAAR RSSVVRVVVR FFFFFRRRAA NSDDVRPPWW MLIAAAPPPG AAVVFVLLQL
             NNRLDDFAAK KVVLLDLLLN FFFFFIIILL -PHHLNGGSS SFVDDDPPPD VVLLVMLLQL
             NNRLDDFAAK KVVLLDLLLN FFFFFIIILL -PHHLNGGSS SFVDDDPPPD VVLLVMLLQL

             LHLLLGGGGD PPAVCCAASR RIFTVIIADH NNTTTGGGLL LDTTTEAFYY SAFFFFDDSL
             LYLLL----- --TALLKKSN NVVTLGGVSL NNVVVGGGAA ANKKKNAQFY SAFFFFEESL
             LYLLL----- --TALLKKSN NVVTLGGVSL NNVVVGGGAA ANKKKNAQFY SAFFFFEESL

             LLDASGGGGG NAMEAAYQRR EIIIICCCCV VVCGAAAA-- -RRREEEEEP PPPLLRRRWR
             LLENLDDDEE RVREEELGRR RIIIISSSSI IIGPTGGGII IHHHEEEEEE EEEKKQQQWR
             LLENLDDDEE RVREEELGRR RIIIISSSSI IIGPTGGGII IHHHEEEEEE EEEKKQQQWR

             DRLRGSSSAP LLGGNNARQA AMVGSGEGG- -HSSVCCLGG HGGRLLFFSS AWEADDDDGG
             VLMNGEEESK LLSSYYASQA AILWNYSNNL YSIIVFFLAA NDDLLLLLTT SWR-------
             VLMNGEEESK LLSSYYASQA AILWNYSNNL YSIIVFFLAA NDDLLLLLTT SWR-------

             DDNNNNSSNV VSGSSDNNSG SGKSAARDGS SSSSSVVLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- -------MTT PPFFFFFQQ- ---WWMMDPP ASSGDAFFPP
t5g66770.1   AMCCCTTSSN MIQIQQQQQQ QQQQQHDHII GGIIIIINNL NNPWW--NTT SFFGSGAAPP
t5g66770.2   AMCCCTTSSN MIQIKQQQQQ QQQQQHDHII GGIIIIINNL NNPWW--NTT SFFGSGAAPP

             PPPPPAAA-- ---------- -------PPP DDDDGGG--- ---------- ----------
             PDPPPFFQTT GGGDDSNNPG FFFNHHHTTT TTTTGGGRRL LSFGGGGGGG GGEFESDDDD
             PPPPPFFQTT GGGDDSNNPG FFFNHHHTTT TTTTGGGRRL LSFGGGGGGG GGEFESDDDD

             ---------- ---------- ----YYDDPP A---D----- ---------- -------VDD
             WWWMTLLLIS GGGSVAAAAD DGCWHHDDNP DVYYDPPPPP FFFDTYPPRS VVVSDRVIDD
             WWWMTLLLIS GGGSVAAAAD DGCWHHDDNP DVYYDPPPPP FFFDTYPPRS VVVSDRVIDD

             AALLLPAAAA AFPPCPAAAA AAAALLL-AM MMREEAAG-- ---------L VHLLSSSAAI
             TTPPPLPLLL LWPPSPSSSI IPPPTTTESP PPKPPNNDDF LLEEEPPPPL LKAIDDDARI
             TTPPPLPLLL LWPPSPSSSI IPPPTTTESP PPKPPNNDDF LLEEEPPPPL LKAIDDDARI

             IIEEAGDHAS SSQQQLAASH HHHAALLASA GIIGRVVFTS SRLLLF-PPP PDAEEAAFL-
             IISSDSDPAS SSTTTLLLIR RRREEVVSGD ---ERFFFES SRLLLSNPPS SSSTTDDLIL
             IISSDSDPAS SSTTTLLLIR RRREEVVSGD ---ERFFFES SRLLLSNPPS SSSTTDDLIL

             -YHHFYYAAC CPYLKKFHHT TTQLLAAHGD DHVIDFSLMM QQPPALLPGP F-LLITTGGI
             SYKTLNNAAC CPYSKKFHHT TTQLLAAEKN NKIVDFGIVV QQPPALLTGP TQIIVSSGGI
             SYKTLNNAAC CPYSKKFHHT TTQLLAAEKN NKIVDFGIVV QQPPALLTGP TQIIVSSGGI

             IGGPSPGGDD ----LDDVGG GLLLDLARVV RRRVVVRFFF FRRRGGNLVP PPPPMLLQAP
             IPPPSLEEPP SSSSLAATGG GLLLDFAKLL DDDLLLNFFF FIIIPP-ILG GGGGSFFRDP
             IPPPSLEEPP SSSSLAATGG GLLLDFAKLL DDDLLLNFFF FIIIPP-ILG GGGGSFFRDP

             AVFFFNNNNS SLHHRLDADQ QIAALDCCVA SVVVRRPPPK KKFTTTVVEQ QEEDDHNNFF
             VLVVVNNNNF FLYYKL--DE EVTTLRLLAK SLLLNNPPPR RRVTTTLLEY YEESSLNNFF
             VLVVVNNNNF FLYYKL--DE EVTTLRLLAK SLLLNNPPPR RRVTTTLLEY YEESSLNNFF

             DFALYYAADL DDAAASSGAG AE--AQIIIC GGGA-REERE PPPLRRLALV VVVGSAQQQQ
             NVALFFAAEL EEPNGRRDEE VERREGLLLG PPKTIREERE EEEKLLMAFV VVVSNAQQQQ
             NVALFFAAEL EEPNGRRDEE VERREGLLLG PPKTIREERE EEEKLLMAFV VVVSNAQQQQ

             QRMMFFSSGE EGG-HHHHHS SSEEEADGCL LTGGWHHGGG RSAASSSAAA AEAGGDGGGG
             QKIIYYNNYS SNNLSSSSSI IISSSKPGFI ISAAWNNDDD LTLLSSSSSS SR--------
             QKIIYYNNYS SNNLSSSSSI IISSSKPGFI ISAAWNNDDD LTLLSSSSSS SR--------

             GGNNSNNSSN SSSGDSGGSS SSNSGARRRD DGSSSSVVL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -----MTFF- -----WPPDP AAASGDAAGG
t5g66770.1   ASGNNNLLMA AIIIAAAQQQ QQVVVQQQEQ HHHHDHIFIP PLSNPWPPNT SSSFGSGGSS
t5g66770.2   ASGNNNLLMA AIIIAAAQQQ QQVVVQQQEQ HHHHDHIFIP PLSNPWPPNT SSSFGSGGSS

             FFFPPPPPAA AAAV------ ---------- -----ADDGV GGGYY----- ----------
             AAAPPDDDFF FQQVTTGNDP GPFFPPNLDD HHHHHATTGG GGGFFRLSDD GGTTGGGGGW
             AAAPPPPPFF FQQVTTGNDP GPFFPPNLDD HHHHHATTGG GGGFFRLSDD GGTTGGGGGW

             ---------- ---------- ---------- -PPA-AD--- ---------- ----VVAAAA
             WMMMMTTTLI IISGDDDDAA DPPDCCCCDW WPPDIPDFFD DPSRSSSVVQ DDVVIISSSS
             WMMMMTTTLI IISGDDDDAA DPPDCCCCDW WPPDIPDFFD DPSRSSSVVQ DDVVIISSSS

             ALLLPFFAAA APCPDDAAAA AA----MMRR RRREVVAGGG GRR----VLL MGAIEAHALL
             SPPPLPPPPL LPSPLLSSII PPHHEEPPTK KKKETTNDDD DEEDFFELAA Y-RISDPNEE
             SPPPLPPPPL LPSPLLSSII PPHHEEPPTK KKKETTNDDD DEEDFFELAA Y-RISDPNEE

             SSSAAQLHAA ASAASSGRRV VVHHFTLLSR LFFFFPSSSP AAAPTTDAAA FFFLL--HHE
             SSSKKTLRES SGDDTTERRV VFYYFTLLSN LSSSSPSSSP TTTSSSSSSD LLLIILLKKD
             SSSKKTLRES SGDDTTERRV VFYYFTLLSN LSSSSPSSSP TTTSSSSSSD LLLIILLKKD

             EAACCYYKAA ILEAHGCCDD VVVHVVIFSL MMGLQWPIQL LRRRRRGGGF LLITGIPPPP
             DAACCYYKAA ILEAEKSSNN IIIHIIVFGI VVGIQWPLQT TRRRRRSSGT IIVSGIAPPP
             DAACCYYKAA ILEAEKSSNN IIIHIIVFGI VVGIQWPLQT TRRRRRSSGT IIVSGIAPPP

             SPTDD--RRR VGGLRRLLLL ARRVRVRSSR GGVVAASDEE RRLLQQIAPP GEEEAAVVVL
             SLGPPPSIII TGGNRRFFFF AKKLDLNDDI PPIITTPHLL NNFFRRVDPP DEEEVVLLML
             SLGPPPSIII TGGNRRFFFF AKKLDLNDDI PPIITTPHLL NNFFRRVDPP DEEEVVLLML

             HLLLLGDDDP PPD-----IV VVLSVVRPPK TTTVIEQADD NKKTTGFFLL LDRRRTELLF
             YLLLL----- --DIIIIIVA AALSLLNPPR TTTLGEYVSS NRRVVGFFAA ANRRRKNLLQ
             YLLLL----- --DIIIIIVA AALSLLNPPR TTTLGEYVSS NRRVVGFFAA ANRRRKNLLQ

             YYYAADSSSG GGGAGAAMMM AEE--YYYCC IICGGEEGRR EEHEPPLSRW WRLLRAALSP
             FYYAAELRRD DDSEEVVRRR VEERRLLLSS LLGPPEEKRR EEMEEEKEQW WRMMNAAFEK
             FYYAAELRRD DDSEEVVRRR VEERRLLLSS LLGPPEEKRR EEMEEEKEQW WRMMNAAFEK

             LLGGNNNALL LLRRQAAAAM MMLGFGG-HH VEEADDGGGL LLLWHRRRLL LLLFFSASSA
             LLSSYYYAVV VVSSQAAAAI IILWYYNYSS VEEKPPGGGI LLLWNLLLLL LLLLLTLSSS
             LLSSYYYAVV VVSSQAAAAI IILWYYNYSS VEEKPPGGGI LLLWNLLLLL LLLLLTLSSS

             WWAGDGNNNN SSSNVSGSSG SNNSSSSNNN NSAGSSVLL
             WW-------- ---------- ---------- ---------
             WW-------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------MDDD DDTPQQ---- --WWWPDALL
t5g66770.1   MAAAAYYCTT DDSGNLLAAV KKQQQQQQQQ QQQQQQHQQQ QQIGNNPPLL NNWWWPNLLL
t5g66770.2   MAAAAYYCTT DDSGNLLAAV KKQQQQQQQQ QQQQQQHQQQ QQIGNNPPLL NNWWWPNLLL

             LDGFFLLPPP PPPPPAAAAA AVV------- ---------G GY-------- ----------
             LSSAAFFPPP DDPPPFFFQQ QVVTGDDNDP GGGPNNHHHG GFRRRLSDFG GGEEWWMMET
             LSSAAFFPPP PPPPPFFFQQ QVVTGDDNDP GGGPNNHHHG GFRRRLSDFG GGEEWWMMET

             ---------- ---------- DPA-----AD ---------- -----AALLL PFAAAAPCAA
             LLLLLISGGG AGGPDCCTWW DPDYYIYYPD PPPFFTTTYS LNRRVTSPPP LPPPTLPSSS
             LLLLLISGGG AGGPDCCTWW DPDYYIYYPD PPPFFTTTYS LNRRVTSPPP LPPPTLPSSS

             AAAAV---RR RREVAGIIIR ---LVVLLSC CGAIIIIEAA DALLAAQQLD HHHAAALLLA
             SIPPLHEETT KKPTNDSSSE DDPLLLAADC C-RIIIISDD DNEEAATTLQ RRRSSSVVVS
             SIPPLHEETT KKPTNDSSSE DDPLLLAADC C-RIIIISDD DNEEAATTLQ RRRSSSVVVS

             AVVSAAAASI IAVHHTTTLL SRRPSSSSVV PPTDAAEHLL ----YFYEAP LFAHAQQALL
             ELLGDPPPT- -AFYYTTTLL SNNPSSSSAA SSSSSSTEII LSSSYLNDAP SFAHAQQALL
             ELLGDPPPT- -AFYYTTTLL SNNPSSSSAA SSSSSSTEII LSSSYLNDAP SFAHAQQALL

             EAFHGCCHHV VSLLQQQQQL LLQQPAAAAL IIIQAAALLP GP-LRRRRRI ITGGGGSSTG
             EATEKSSKHI IGIIQQQQQI IIQQPAAAAL LLLQAAALLT GKQIRRRRRV VSGGPPSSGE
             EATEKSSKHI IGIIQQQQQI IIQQPAAAAL LLLQAAALLT GKQIRRRRRV VSGGPPSSGE

             GE---LLDDV VGLLLADDDL LRRVRVFFAA NNLDEEVVRR RRRPPPPMML QQAAAPPAVA
             EEPSSLLAAT TGNNNRDDDF FKKLDLFFTT --IHLLLLNN NNNGGGGSSF RRDDDPPVLA
             EEPSSLLAAT TGNNNRDDDF FKKLDLFFTT --IHLLLLNN NNNGGGGSSF RRDDDPPVLA

             AFFSVVLQQQ QHLLLLDPPA QA--DAVVVV ASVRRRKTTI EEEEEHNNNK GFFDDFFTEE
             AVVFMMLQQQ QYLLLL---- ETIIDTAAAA KSLNNNRTTG EEEEELNNNR GFFNNVVKNN
             AVVFMMLQQQ QYLLLL---- ETIIDTAAAA KSLNNNRTTG EEEEELNNNR GFFNNVVKNN

             LLFYYYYSVS LLDDDAAAAS GGGGANNAAA A--LRREEEI ICCIVERRRE RRHEEEELSS
             LLQFFYYSVS LLEEEPPPGR DDSSERRVVV VRRFRRRRRI ISSLIEHHRE RRMEEEEKEE
             LLQFFYYSVS LLEEEPPPGR DDSSERRVVV VRRFRRRRRI ISSLIEHHRE RRMEEEEKEE

             RRRRTGLLSA VLGSNAAQQR MVVGLLLGG- --HHSVVVAD DGCCCTLLHH GRPFSSAWWE
             QQRLEGFFES VLSNYAAQQK ILLWNNNYNY YYSSIVVVKP PGFFFSLLNN DLPLTSSWWR
             QQRLEGFFES VLSNYAAQQK ILLWNNNYNY YYSSIVVVKP PGFFFSLLNN DLPLTSSWWR

             AAGGDDGGGN NSNNNSSGGS DNNSSSSNKS GGGDGVVCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------M TFFPPQ---- ---WMDPPAA ASSSSGGDFF
t5g66770.1   MMAYYMMCSG GAQQQIIIIQ KQQQQHDDDH IFFGGNPPPS LLLW-NTTSL LFFFFGGSAA
t5g66770.2   MMAYYMMCSG GAQQQIIIIK KQQQQHDDDH IFFGGNPPPS LLLW-NTTSL LFFFFGGSAA

             LLLPPPPPP- ---------- ---------- ------AAGV VVVGY----- ----------
             FFFDPPPPPT TTGGGSDDPG GGFFPPFFPN NLLHHHAAGG GGGGFRDGGG GEEEDEWEET
             FFFPPPPPPT TTGGGSDDPG GGFFPPFFPN NLLHHHAAGG GGGGFRDGGG GEEEDEWEET

             ---------- ---YYDPPA- ---AAAD--- ---------- VAAAAAFAAA FPPPCAPPPA
             LSSGGDDAAD DDWHHDNNDV VYYPPPDFYP RRSSVQPSNV ITTSSSPPTL WPPPSSPPPS
             LSSGGDDAAD DDWHHDNNDV VYYPPPDFYP RRSSVQPSNV ITTSSSPPTL WPPPSSPPPS

             AAAL-AMRRR EEEVVVAGIR ------LVLL SAAGAIIIEE AAAGGHAAAA AAAHAAAAAA
             IPPTESPTTT DPETTTNDSE DFLEEPLLAA DAA-RIIISS DDDSSPNNNA AKKREEEESS
             IPPTESPTTT DPETTTNDSE DFLEEPLLAA DAA-RIIISS DDDSSPNNNA AKKREEEESS

             AAVVSSSAAS IIGGRVHLLS LLLLLFFP-- VVVTEEEHAL LL------YY HHHEAPPLLF
             SSLLGGGDPT --EERVYLLS LLLLLSSPNN AAASTTTEDI IILLSSSSYY KKTDAPPSSF
             SSLLGGGDPT --EERVYLLS LLLLLSSPNN AAASTTTEDI IILLSSSSYY KKTDAPPSSF

             FAHANNLEEA HGCCDVVDFF FLMQWWWWIQ QAAALLRRPG GGPF----LL RIIPPPTRRE
             FAHANNLEEA EKSSNIIDFF FIVQWWWWLQ QAAALLRRTS SSKTQQQQII RVIALLGSSE
             FAHANNLEEA EKSSNIIDFF FIVQWWWWLQ QAAALLRRTS SSKTQQQQII RVIALLGSSE

             RDDVGRAALA AARSVVVRVV SFFGVAAAVR RMMLQQQGEE VFNSVLLHRL GDAQAAA-II
             IAATGRRRFA AAKVLLLDLL DFFPILLLLN NSSFRRRDEE LVNFMLLYKL ---ETTTIVV
             IAATGRRRFA AAKVLLLDLL DFFPILLLLN NSSFRRRDEE LVNFMLLYKL ---ETTTIVV

             DAVVVLLLDC CCVPKIITTV IIQEADDDHH HNNKKTFFLL DDRELLLFYS VVFDSSSLDD
             DTAAALLLRL LLAPRVVTTL GGYEVSSSLL LNNRRVFFAA NNRNLLLQYS VVFESSSLEE
             DTAAALLLRL LLAPRVVTTL GGYEVSSSLL LNNRRVFFAA NNRNLLLQYS VVFESSSLEE

             AASAASGGGG NNAEAALLQE CVGGGEGAAR REEERRHEPP LLLLLRRRDD DDLLRAAAGG
             PNLGGRDDSS RRVEEEFFGR SIPPPEKTGR REEERRMEEE KKKKKQRRVV VVMMNAAAGG
             PNLGGRDDSS RRVEEEFFGR SIPPPEKTGR REEERRMEEE KKKKKQRRVV VVMMNAAAGG

             GLSPPPLGGG NNNARQLLLG FSSSGE--HH SVVVEEEEEA ADGCCTGWHH HPPLSASSWA
             GFEKKKLSSS YYYASQLLLW YNNNYSYYSS IVVVESSSSK KPGFFSAWNN NPPLTLSSW-
             GFEKKKLSSS YYYASQLLLW YNNNYSYYSS IVVVESSSSK KPGFFSAWNN NPPLTLSSW-

             AAAGGGGNNN SSSNVGGSSS SDSSSSNNGG GKSRGSSLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ---MMFFQ-- -----PDPPA
t5g66770.1   YMDSSGNNLL MAIAQQQVVI KQKQQQQQQE EQQQQQQHHH HDDHHIINPP LLLLNPNTTL
t5g66770.2   YMDSSGNNLL MAIAQQQVVI KKKQQQQQQE EQQQQQQHHH HDDHHIINPP LLLLNPNTTL

             GGDGLLPPPP AAVV------ ---------P PDDDVVY--- ---------- ----------
             GGSSFFPPPP FQVVGGDNDP GPNLDHHHHT TTTTGGFRFF GGGTGGGGEF EMMTTISGAA
             GGSSFFPPPP FQVVGGDNDP GPNLDHHHHT TTTTGGFRFF GGGTGGGGEF EMMTTISGAA

             -------Y-- -AA------- ---------- ---DALLLAA AAAPDDAAAA AAVV-AMMMM
             ADGDCWWHVI YPPDDDDTYP PRLLVPDLNR RRVDTPPPPP TTTPLLSSII IPLLESPPPP
             ADGDCWWHVI YPPDDDDTYP PRLLVPDLNR RRVDTPPPPP TTTPLLSSII IPLLESPPPP

             MRRREEEEEA GGIII----- -----HHGGI GGGGAAASSS QADDALAAAV ASSGIGRRRV
             PTTKDPEEEN DDSSSDDFFL EPPPPKK--I SSSSNAASSS TLQQEVEEEL PTT--ERRRV
             PTTKDPEEEN DDSSSDDFFL EPPPPKK--I SSSSNAASSS TLQQEVEEEL PTT--ERRRV

             VAAAHHTTAA SLP--PPPTT TTTDAH-YYH FYPLKKAAHH HTANQQQQAA ILLEHHGVVV
             VAAAYYTEAA SLPNNSSSSS SSSSSELYYT LNPSKKAAHH HTANQQQQAA ILLEEEKIII
             VAAAYYTEAA SLPNNSSSSS SSSSSELYYT LNPSKKAAHH HTANQQQQAA ILLEEEKIII

             IDFSSSLQGG LLQAAIIIAA LLRGGGGGGP FFF-TGIIGP PPPGGRE-LL RDVVVLLRRR
             VDFGGGIQGG IIQAALLLAA LTRSSSSGGP TTTQSGIIPA APLEESESLL IATTTNNRRR
             VDFGGGIQGG IIQAALLLAA LTRSSSSGGP TTTQSGIIPA APLEESESLL IATTTNNRRR

             LADLRRSSVR RFFGVAAANN NSSLEPQQQI IPGVASSVVL LLHRLGGDPP AAADDDDDQQ
             LRDFKKVVLN NFFPILTT-- -PPILGRRRV VPDLAFFMML LLYKL----- ---DDDDDEE
             LRDFKKVVLN NFFPILTT-- -PPILGRRRV VPDLAFFMML LLYKL----- ---DDDDDEE

             AAPPPP---- IDAVLCVARP PIFFTVIEQQ AADNKTTFFT TAAFFFFYAV FSLLDDAAAA
             TTPPPPTTII VDTALLAKNP PVVVTLGEYY VVSNRVVFVK KAAQQQQFAV FSLLEEPPNG
             TTPPPPTTII VDTALLAKNP PVVVTLGEYY VVSNRVVFVK KAAQQQQFAV FSLLEEPPNG

             ASSGGGAGNN A-AAYYQQQE CDIIIVVCCC GEGAAAA--- RREHEEPPSS RWRLLTRRAG
             GRRDSSEERR VREELLGGGR SGLLLIIGGG PEKTTGGIII HHEMEEEEEE QWLMMENNAG
             GRRDSSEERR VREELLGGGR SGLLLIIGGG PEKTTGGIII HHEMEEEEEE QWLMMENNAG

             AAVVLGGNLL QAMMVGLLFS SGGE-SVVAD GGLLTWWHPL LFSASSWWEA GGGGGGGDDD
             SSVVLSSYVV QAIILWNNYN NYYSLIVVKP GGIISWWNPL LLTLSSWWR- ----------
             SSVVLSSYVV QAIILWNNYN NYYSLIVVKP GGIISWWNPL LLTLSSWWR- ----------

             NNNNNNSNSS NSGGSSSSSS DDDDDSSNNS SSNGGSSVC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----PQ---- --WWPAAASS GGGGLDDAAG
t5g66770.1   AAACTTDDSS GGLMAQQQQV VIIKKKQQQQ QHQDGNPPPL SLWWTSSLGG GGGGLSSGGS
t5g66770.2   AAACTTDDSS GGLMAQQQQV VIIKKKQQQQ QHQDGNPPPL SLWWTSSLGG GGGGLSSGGS

             GGFPAVV--- ---------- ------AAAP DDDGY----- ---------- ----------
             SSADFVVGGG GGSSNDPFFF PNLDHHAAAT TTTGFRLLDD FFGGGTGGGG GGEESDDWEE
             SSAPFVVGGG GGSSNDPFFF PNLDHHAAAT TTTGFRLLDD FFGGGTGGGG GGEESDDWEE

             ---------- -------YDP ---------- G--------- ---------- ----VVDDAA
             LIIISGGSVA ADGDDTWHDP YYYVVIIYYY GPPFDTTTPS SLQQQQQPPS DNNRIIDDTT
             LIIISGGSVA ADGDDTWHDP YYYVVIIYYY GPPFDTTTPS SLQQQQQPPS DNNRIIDDTT

             PPPEAAAAAF FPCADAAA-- MRRREEEEEA GII------- -LLHHLLMAG IEAGGDDHHH
             LLLPPLLLLW WPSSLSIPHH PTTKEPEEEN DSSDDDLLEP PLLKKAIYA- ISDSSDDPPP
             LLLPPLLLLW WPSSLSIPHH PTTKEPEEEN DSSDDDLLEP PLLKKAIYA- ISDSSDDPPP

             ALLQLADDHH ALAAVSASSI IGRVVVVAVV HFFTRRLLPP SPVAPTTAAA FL-YHHEAPP
             NEETLLQQRR SVSSLGPTT- -ERVVVVAFF YFFENNLLPP SPATSSSDDD LILYTTDAPP
             NEETLLQQRR SVSSLGPTT- -ERVVVVAFF YFFENNLLPP SPATSSSDDD LILYTTDAPP

             PPYYLLLLFA AAHAANALEE AFFHGCDVDF LLLMQGLLQA LLLIALAAAR RRGGLLRRGP
             PPYYSSSSFA AAHAANALEE ATTEKSNIDF IIIVQGIIQA LLLLALAAAR RRSGIIRRPA
             PPYYSSSSFA AAHAANALEE ATTEKSNIDF IIIVQGIIQA LLLLALAAAR RRSGIIRRPA

             SPPTRDDE-- RDGGLADLVV RVRFFSSFRG GVSLLDVVVR RRQQIIIAGA ASQHRLLGDD
             SLLGSPPESS IAGGNRDFLL DLNFFDDFIP PIPIIHLLLN NNRRVVVDDA AFQYKLL---
             SLLGSPPESS IAGGNRDFLL DLNFFDDFIP PIPIIHLLLN NNRRVVVDDA AFQYKLL---

             PAADA-IIID ALLDDVVAAS VVRPIFFFTQ QEAAADHKTG FDFTTEEAAL LFYYYSSAVF
             ---DTIVVVD TLLRRAAKKS LLNPVVVVTY YEVVVSLRVG FNVKKNNAAL LQYYYSSAVF
             ---DTIVVVD TLLRRAAKKS LLNPVVVVTY YEVVVSLRVG FNVKKNNAAL LQYYYSSAVF

             DDDLGGGNNA MA--AAYLQR RREIDIVCGG EEAA--RRRR EHHEEPSSRD TTALLAPLLL
             EEELDDERRV RVRREELFGR RRRIGLIGPP EETTIIHRRR EMMEEEEEQV EEAFFSKLLL
             EEELDDERRV RVRREELFGR RRRIGLIGPP EETTIIHRRR EMMEEEEEQV EEAFFSKLLL

             LLGLRRRRAR RMMMLVVFFS EEG--HSVVE EAGCGGPLFS WEAGGDDDGG GGGGDDDNNN
             LLSVSSSSAK KIIILLLYYN SSNYYSIVVE SKGFAAPLLT WR-------- ----------
             LLSVSSSSAK KIIILLLYYN SSNYYSIVVE SKGFAAPLLT WR-------- ----------

             NSNSSSGGGG SSSDDSSSNG SSGSSSSGRD SSSSSSCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -DTFPPFQQ- ------WWMD AAASLAGFLP
t5g66770.1   MMTDSGMAAI AAAAQQVIQQ QEQQQQQQQQ QQIFGGINNP PSSNNPWW-N SLLFLGSAFP
t5g66770.2   MMTDSGMAAI AAAAQQVIQQ QEQQQQQQQQ QQIFGGINNP PSSNNPWW-N SLLFLGSAFP

             PA-------- ---------- ---AAPDDGG GY-------- ---------- ----------
             DQTGGGGSNN DPPPFPLDHH HHHAATTTGG GFLSDGGGGG TTGGGGFESS SDDDEIGDDV
             PQTGGGGSNN DPPPFPLDHH HHHAATTTGG GFLSDGGGGG TTGGGGFESS SDDDEIGDDV

             ------PPPP -----AD--- ---------- ----DAALEE FFAAAAFFPC CPDDAAAAL-
             AADCDDNNPP VIIYYPDPPD TYSSLLSVPP NNRVDTSPPP PPPPTLWWPS SPLLIIPPTH
             AADCDDNNPP VIIYYPDPPD TYSSLLSVPP NNRVDTSPPP PPPPTLWWPS SPLLIIPPTH

             ---RRRREEE VGGGIIR--- --------LV HLLLMMCAGA ADDHAQLDHH AAAAAAAAAV
             HEETTTKEDE TDDDSSEDDD DDDFDEEPLL KAIIYYCA-D DDDPNTLQRR EEESSEEEEL
             HEETTTKEDE TDDDSSEDDD DDDFDEEPLL KAIIYYCA-D DDDPNTLQRR EEESSEEEEL

             VVASRVVVVF TTALSSSSSL PP-SPVAAPT AEHFFYHHHF FFYYEECPKK KAHHFFTAAN
             LLDTRVVVFF EEALSSSSSL PPNSPATTSS STELLYKTTL LLNNDDCPKK KAHHLLTAAN
             LLDTRVVVFF EEALSSSSSL PPNSPATTSS STELLYKTTL LLNNDDCPKK KAHHLLTAAN

             NQQAIEFFCH HHDSLLMMML QWPLAAALRR PFLIGPSPGG REEELRDVRL DLAVVVRRFF
             NQQAIETTSH HHDGIIVVVI QWPLAAATRR PTIIPPSLEE SEEELIATRL DFALLLNNFF
             NQQAIETTSH HHDGIIVVVI QWPLAAATRR PTIIPPSLEE SEEELIATRL DFALLLNNFF

             FFFFFGAANN SSSDEEVVRP PWWMMLLQIG GEAAVVNSVV LLHHLLLDDP PAQQP-IACV
             FFFFFPLL-- PPPHLLLLNG GSSSSFFRVD DEVVLLNFMM LLYYLLL--- --EEPIVTLA
             FFFFFPLL-- PPPHLLLLNG GSSSSFFRVD DEVVLLNFMM LLYYLLL--- --EEPIVTLA

             VSSVVKKTVE QQEAADDLLD DAFFYSAVFF DLLLDAAASS GGGGAAAGGG NAMAE--AYQ
             ASSLLRRTLE YYEVVSSAAN NAQQYSAVFF ELLLENGGRR DSSSEEEEEE RVRVERRELG
             ASSLLRRTLE YYEVVSSAAN NAQQYSAVFF ELLLENGGRR DSSSEEEEEE RVRVERRELG

             ICDVVCGAAA A--REEEERE EEPLLRRRRR LRRAAGGSSS AAPLLGGSSS AQQQARRVVL
             ISGIIGPTTG GIIHEEEERE EEEKKRRRLL MNNAAGGEEE SSKLLSSNNN AQQQAKKLLN
             ISGIIGPTTG GIIHEEEERE EEEKKRRRLL MNNAAGGEEE SSKLLSSNNN AQQQAKKLLN

             LFFFFSGEG- -HSVVEAAAD GLLTLGHHRR PLFFSSSSWW EEAAAGGGDG GDDDNNNNSS
             NYYYYNYSNL YSIVVEKKKP GIISLANNLL PLLLTTTSWW RR-------- ----------
             NYYYYNYSNL YSIVVEKKKP GIISLANNLL PLLLTTTSWW RR-------- ----------

             NSSSNSSSSS SGGSSDNSGS SGKSSGGAAD DGSSSSVLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------M DT-------- -MDPPAASSS SSLDDDDALP
t5g66770.1   AYMCCTTDDS GNMMAQVVIQ QEQQQHHHQH QIPLLLNNPP P-NTTSLGGG GFLSSSSGFD
t5g66770.2   AYMCCTTDDS GNMMAQVVIK QEQQQHHHQH QIPLLLNNPP P-NTTSLGGG GFLSSSSGFP

             ---------- -------DVY Y--------- ---------- ---------- ----------
             GGGGSGFFFP NLDDHHHTGF FLSDFFFFGG GGGGEEEDDE EMTTISSGGD SSSSVAAAGG
             GGGGSGFFFP NLDDHHHTGF FLSDFFFFGG GGGGEEEDDE EMTTISSGGD SSSSVAAAGG

             ---------- ---YDPPA-- ----AADD-- ---------- ---------- -AAAAALPFF
             PPDDCCDDDD TWWHDNNDVV IIIYPPDDPF FFDDYYPSSR RSPPSLLNNR RTTTTSPLPP
             PPDDCCDDDD TWWHDNNDVV IIIYPPDDPF FFDDYYPSSR RSPPSLLNNR RTTTTSPLPP

             AAAAAFPPCA DAAAAVLL-- AMRRREEVAG GIRR------ ---HLLMMMA AGGAIEEADA
             TLLLLWPPSS LIPPPLTTHE SPTKKEDTND DSEEDDDDDL LPPKIIYYYA A--RISSDDN
             TLLLLWPPSS LIPPPLTTHE SPTKKEDTND DSEEDDDDDL LPPKIIYYYA A--RISSDDN

             LAAQQSSHLL AVVAAASSGR RVAAHLSRRR LLPSPPVAAP TDAEL--YYH HFFFFYACCC
             EKKTTIIRVV ELLDDPTTER RVAAYLSNNR LLPSPPATTS SSSTISSYYK TLLLLNACCC
             EKKTTIIRVV ELLDDPTTER RVAAYLSNNR LLPSPPATTS SSSTISSYYK TLLLLNACCC

             PPLKKHFTAN NNNEEFHGDH HVHHIDFLLM QLLLQPALIA AARRRRRRPP PPGFFFLRTT
             PPSKKHLTAN NNNEETEKNK KIHHVDFIIV QIIIQPALLA AARRRRRRTT TTGTTTIRSS
             PPSKKHLTAN NNNEETEKNK KIHHVDFIIV QIIIQPALLA AARRRRRRTT TTGTTTIRSS

             GGIIPSTTRR E---LLDVGL RLDDDRSVVV RFSFFVAADD EVVRPWMMML QIAPPGGEAN
             GGIIASGGSS EPPSLLATGN RLDDDKVLLL NFDFFILLHH LLLNGSSSSF RVDPPDDEVN
             GGIIASGGSS EPPSLLATGN RLDDDKVLLL NFDFFILLHH LLLNGSSSSF RVDPPDDEVN

             NLHLGGGDAA PP-DDDVDCV AAAVVRRPII FIQEADHHNK TTTDRFTEEE AFYYAVFAAA
             NLYL-----T PPTDDDARLA KKKLLNNPVV VGYEVSLLNR VVVNRVKNNN AQFFAVFNNN
             NLYL-----T PPTDDDARLA KKKLLNNPVV VGYEVSLLNR VVVNRVKNNN AQFFAVFNNN

             SSSGGGGGGN NAAAAAAAE- --AAYYLQEE IIVRRREHLL LSSRWRDDLL LTRRGLLLSA
             LRRDDDEEER RVVVVVVVER RREELLFGRR ILIHHHEMKK KEEQWRVVMM MENNGFFFES
             LRRDDDEEER RVVVVVVVER RREELLFGRR ILIHHHEMKK KEEQWRVVMM MENNGFFFES

             PNNLLQAARM LLGGLLFGG- --SVVEADCT TGWHGGRRPL LASAAWWEEA AGGGDDGGDD
             KYYVVQAAKI LLWWNNYNNL LYIVVSKPFS SAWNDDLLPL LLSSSWWRR- ----------
             KYYVVQAAKI LLWWNNYNNL LYIVVSKPFS SAWNDDLLPL LLSSSWWRR- ----------

             DNNSSSSNNS SSGSSGDDSN NNSGSSKKGA DDSSSSLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------M MPPF------ --PMDDDDPS
t5g66770.1   AYCCTDDLLM MMAIAQQQQQ IKKKKQQQQQ QQQHHHQDDH HGGIPLLSLL LNP-NNNNTG
t5g66770.2   AYCCTDDLLM MMAIAQQQQQ IKKKKQQQQQ QQQHHHQDDH HGGIPLLSLL LNP-NNNNTG

             SLDDAGGPPV ---------- --------PD DDDD------ ---------- ----------
             FLSSGSSPPV TGGGGGDSND DPGFFPLDTT TTTTRDGGGG TGGGGGGGEE FFSDDMETIS
             FLSSGSSPPV TGGGGGDSND DPGFFPLDTT TTTTRDGGGG TGGGGGGGEE FFSDDMETIS

             ---------- ----DPAAA- --AD------ ---------- -VAAAAAEFF FFAAAFFPPP
             GGGGGGDDGD CDDTDPDDDY YYPDYYYPRL SVVPSSLNVV VITTSSSPPP PPTLLWWPPP
             GGGGGGDDGD CDDTDPDDDY YYPDYYYPRL SVVPSSLNVV VITTSSSPPP PPTLLWWPPP

             CDAAAAAAA- EEEEEEEEVA GIR------- -HLAAGGAAA AGLLASSSSS AQQSSHAAAA
             SLSSSSIPPE EDPPPPEETN DSEDDDFDDL PKIAA--RDD DSEEASSSSS KTTIIREESS
             SLSSSSIPPE EDPPPPEETN DSEDDDFDDL PKIAA--RDD DSEEASSSSS KTTIIREESS

             AAAVSAASGG RRRRVVVAHH FTALLSRRRF --PPVAAPTT TDAAHHF--Y YYHFFFACCC
             SEELGPPTEE RRRRVVVAYY FEALLSNNRS NNPPATTSSS SSSSEELLSY YYTLLLACCC
             SEELGPPTEE RRRRVVVAYY FEALLSNNRS NNPPATTSSS SSSSEELLSY YYTLLLACCC

             PYYYLLLFFF FFHHFTAAAL LLEGGCCCDV ISSSSGQPAA QLRGGPPPFF -LIIITTGPS
             PYYYSSSFFF FFHHLTAAAL LLEKKSSSNI VGGGGGQPAA QTRSSKKPTT QIVVVSSPAS
             PYYYSSSFFF FFHHLTAAAL LLEKKSSSNI VGGGGGQPAA QTRSSKKPTT QIVVVSSPAS

             PPTGRE-LLR DVGGGLLLAA ADLLAARRVF RVAAALLLPP PMMLLIAAPP PGGEANSLHH
             LLGESESLLI ATGGGNLLRR RDFFAAKKLF IILLLIIIGG GSSFFVDDPP PDDEVNFLYY
             LLGESESLLI ATGGGNLLRR RDFFAAKKLF IILLLIIIGG GSSFFVDDPP PDDEVNFLYY

             RRLLLLPPAQ QAP----DDD AAALDDCCVV SVRRRRPIFT TVIIEEENNN KTGGDRRFTE
             KKLLLL---E ETPIIIIDDD TTTLRRLLAA SLNNNNPVVT TLGGEEENNN RVGGNRRVKN
             KKLLLL---E ETPIIIIDDD TTTLRRLLAA SLNNNNPVVT TLGGEEENNN RVGGNRRVKN

             FYSSAFLSAS GGAAMAAAAA YLREECDDDI VVVGGGEGA- -RERRHELSW RRRRAGGLLL
             QYSSAFLLGR DDEVRVEEEE LFRRRSGGGL IIIPPPEKGI IHERRMEKEW LNNNAGGFFF
             QYSSAFLLGR DDEVRVEEEE LFRRRSGGGL IIIPPPEKGI IHERRMEKEW LNNNAGGFFF

             SSSAAPGGSA ALRMMMLLLF EE--EEEAAD GLLLLGGWRR LSAASSSWAG DDGDDNNNNN
             EEESSKSSNA AVSIIILNNY SSYYEESKKP GIIIIAAWLL LTLLSSSW-- ----------
             EEESSKSSNA AVSIIILNNY SSYYEESKKP GIIIIAAWLL LTLLSSSW-- ----------

             SNNNVSSGGS GDSNSSSSNN NNKKSSSSSG AADDGSVVV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -MMMTQQ--M DPPALLLDAA FFPPPPPPPP
t5g66770.1   MAYMTTTDLL LIAQVIKQKQ QQQQQHHQQD DHHHINNLN- NTTSLLLSGG AAPDDDDDPP
t5g66770.2   MAYMTTTDLL LIAQVIKKKQ QQQQQHHQQD DHHHINNLN- NTTSLLLSGG AAPPPPPPPP

             PAAAV----- ---------- ---PPPDDGV VG-------- ---------- ----------
             PFFQVGGGGG SPPGGFPNNL DHHTTTTTGG GGRRRDFGGG GTTGESSSSD DEIGGGGDVD
             PFFQVGGGGG SPPGGFPNNL DHHTTTTTGG GGRRRDFGGG GTTGESSSSD DEIGGGGDVD

             ------DDPP A----GA--- ---------- -------VDA APFAFPCDAA AAVL-----M
             GDCCDWDDNP DVVIYGPPFF TTYSSRLLLV PDLLNRRIDS SLPTWPSLSI PPLTEEEEEP
             GDCCDWDDNP DVVIYGPPFF TTYSSRLLLV PDLLNRRIDS SLPTWPSLSI PPLTEEEEEP

             MRREEEV--- ------LLLM SCAAGAAIII EDDHHAAALQ LDSAAAAALA VVASGGRRVA
             PKKEPPTDDD DLEPPPAAIY DCAA-RRIII SDDPPNNNET LQIEESSSVS LLDT-ERRVA
             PKKEPPTDDD DLEPPPAAIY DCAA-RRIII SDDPPNNNET LQIEESSSVS LLDT-ERRVA

             VHTAAALSSL PP-SPATTDE HHA-YEACAA FFAAAANQAL LEAHGDHHHV VVFLLMQQQQ
             FYTAAALSSL PPNSPTSSST EEDLNDACAA LLAAAANQAL LEAEKNKKKI IIFIIVQQQQ
             FYTAAALSSL PPNSPTSSST EEDLNDACAA LLAAAANQAL LEAEKNKKKI IIFIIVQQQQ

             GLLWPALLII IIQPGGPPPF ---RIIITTG GGGGGGPPSS SPTTTGRRE- LRRGRRADLA
             GIIWPALLLL LLQTSGPPPT QQQRVVVSSG GGGPPPPPSS SLGGGESSEP LIIGRRRDFA
             GIIWPALLLL LLQTSGPPPT QQQRVVVSSG GGGPPPPPSS SLGGGESSEP LIIGRRRDFA

             RVRFFFFSSS RRRGASLLDV VVWWMMLAPA VVASSLHHRR LLLGDDDDQA APP----IDD
             KLNFFFFDDD IIIPTPIIHL LLSSSSFDPV LLAFFLYYKK LLL--DDDET TPPTTIIVDD
             KLNFFFFDDD IIIPTPIIHL LLSSSSFDPV LLAFFLYYKK LLL--DDDET TPPTTIIVDD

             VLDCASSSVP KFFVVEEQAD DDKTGFLLRF FTTTELLFYY SAFDDAAASS ASGAAAGGGN
             ALRLKSSSLP RVVLLEEYVS SSRVGFAARV VKKKNLLQFY SAFEEPNNLL GRSEEEEEER
             ALRLKSSSLP RVVLLEEYVS SSRVGFAARV VKKKNLLQFY SAFEEPNNLL GRSEEEEEER

             NAAAEE--YL QEICDIIGGG EAAAAA-RRR REHHPLSSRW WWWDTGGGLL SSAVVLLNNA
             RVVVEERRLF GRISGLLPPP ETTTTGIHHH REMMEKEEQW WWWVEGGGFF EESVVLLYYA
             RVVVEERRLF GRISGLLPPP ETTTTGIHHH REMMEKEEQW WWWVEGGGFF EESVVLLYYA

             ALRRQRMMLV GLFSGG---- VVVVEADDGG CLLHGRLFSA WEAGGDGGGG NNNNNNNNSN
             AVSSQKIILL WNYNYYLYYY VVVVEKPPGG FIINDLLLTL WR-------- ----------
             AVSSQKIILL WNYNYYLYYY VVVVEKPPGG FIINDLLLTL WR-------- ----------

             NSSNSSSGGG GSDSNNNSGS SSSSNSSGGA ARDDDSSSL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --DDTFPFFF Q-----WWPP PMPAAASSSS
t5g66770.1   MMMMAYYYMC TDNLMQVVQK QQQQQQQQQQ QDQQIFGIII NPLSPPWWPP P-TSLLGFFF
t5g66770.2   MMMMAYYYMC TDNLMQVVKK QQQQQQQQQQ QDQQIFGIII NPLSPPWWPP P-TSLLGFFF

             SSSSGLFFFL PPPAAV---- ---------- AAAADYYYY- ---------- ----------
             FFFFGLAAAF PPPFQVTTDD FPPPNNDDHH AAAATFFFFL LDDGGGGGGG GGEEWWLLLL
             FFFFGLAAAF PPPFQVTTDD FPPPNNDDHH AAAATFFFFL LDDGGGGGGG GGEEWWLLLL

             ---------- ---------- YYDDPPPPAA A-----GGGA AD-------- ----------
             IISSVVVAPP DCDDDTWWWW HHDDNPPPDD DYYIYYGGGP PDPDDDDYYS RLLSSSVQQP
             IISSVVVAPP DCDDDTWWWW HHDDNPPPDD DYYIYYGGGP PDPDDDDYYS RLLSSSVQQP

             -------VVV VDAAAAAAAL LEFAAAPPPP PAAAAVVVLL L--RRRREEE EEEEVGI---
             PPSDDDLIII IDTTTSSSSP PPPTTLPPPP PPPPPLLLTT THHTTTTDPP EEEETDSDDD
             PPSDDDLIII IDTTTSSSSP PPPTTLPPPP PPPPPLLLTT THHTTTTDPP EEEETDSDDD

             -----VLMSC AGAIAAGHHA ASSQSHHHAA AAAAASSIGR VAVTTALLLS SRRRRRLFPP
             DDDPPLAYDC A-RIDDSPPA ASSTIRRRES EEDDPTT-ER VAFTEALLLS SNNRRRLSPP
             DDDPPLAYDC A-RIDDSPPA ASSTIRRRES EEDDPTT-ER VAFTEALLLS SNNRRRLSPP

             P-SPPVAAPP TAAEAAAAL- YFEEECCPYL KFAHFFFTTI IAAAGCDVVI FFFLMLLWWP
             PNSPPATTSS SSSTDDDDIL YLDDDCCPYS KFAHLLLTTI IAAAKSNIIV FFFIVIIWWP
             PNSPPATTSS SSSTDDDDIL YLDDDCCPYS KFAHLLLTTI IAAAKSNIIV FFFIVIIWWP

             PLQAALPGGG PFF-LRIIPP PTGRDDEE-- -LDVGGLLLR DAASSVFFRG GVANNLEVVV
             PLQAATTSGG PTTQIRVIAP LGESPPEEPP SLATGGNNNR DAAVVLFFIP PIL--ILLLL
             PLQAATTSGG PTTQIRVIAP LGESPPEEPP SLATGGNNNR DAAVVLFFIP PIL--ILLLL

             PWMLLLQIAP AASQHHHRRL DAQQAPP-II IDLLLCVVAS SPPTIIIEEE EQQEEADDHK
             GSSFFFRVDP VAFQYYYKKL --EETPPTVV VDLLLLAAKS SPPTGGGEEE EYYEEVSSLR
             GSSFFFRVDP VAFQYYYKKL --EETPPTVV VDLLLLAAKS SPPTGGGEEE EYYEEVSSLR

             GFLRFEEAAF YYYAVVSSLA AAASAASAAN AMMAYYLRRR EEICIIVEGA AAA---REHE
             GFARVNNAAQ FFYAVVSSLP PNNLGGREER VRRVLLFRRR RRISLLIEKT TTGIIIHEME
             GFARVNNAAQ FFYAVVSSLP PNNLGGREER VRRVLLFRRR RRISLLIEKT TTGIIIHEME

             EEPPLLSRRD LAAGSAAAVP LGAAARQRML VGGGSGGG-- --HSSEEEEA GGCCLGWPLF
             EEEEKKERRV MAAGESSSVK LSAAASQKIL LWWWNYNNLY YYSIIEEESK GGFFLAWPLL
             EEEEKKERRV MAAGESSSVK LSAAASQKIL LWWWNYNNLY YYSIIEEESK GGFFLAWPLL

             AAAWAAGGDN NNNNNSSVSG SSGNSSSSSG KSGGRSVLL
             SSSW------ ---------- ---------- ---------
             SSSW------ ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --MMMMMDDD TTFFPFQ--- WPPPPMMPAS
t5g66770.1   YTDDNMMIQQ VVIIIQQQKK QQEQQQQQQQ DDHHHHHQQQ IIFFGINPLP WPPPP--TLF
t5g66770.2   YTDDNMMIQQ VVIIIKKKKK QQEQQQQQQQ DDHHHHHQQQ IIFFGINPLP WPPPP--TLF

             GGAAFFLLPP PPAAAAV--- ---------- -------AAP DDDGVY---- ----------
             GGGGAAFFPD PPFFFFVGGD DNNDDGFPNN NNNDHHHAAT TTTGGFLSDF FFFFGGGGGE
             GGGGAAFFPP PPFFFFVGGD DNNDDGFPNN NNNDHHHAAT TTTGGFLSDF FFFFGGGGGE

             ---------- ---------- ---------- ---YYYPA-D ---------- ----------
             EFFFDEWWWL IIISSGGGVA DDGPPDDDDD DTWHHHPDYD DTPPSSLLVQ QQPDDDDDLR
             EFFFDEWWWL IIISSGGGVA DDGPPDDDDD DTWHHHPDYD DTPPSSLLVQ QQPDDDDDLR

             VAAFAAFFFP PPAPDDDAAA AMREEEVVAG --------LL LHLSAGGGII QQLAAADHHL
             ISSPPLWWWP PPSPLLLSIP SPTEDETTND DDDFDLEELL LKIDA---II TTLLLLQRRV
             ISSPPLWWWP PPSPLLLSIP SPTEDETTND DDDFDLEELL LKIDA---II TTLLLLQRRV

             AAAAAAGIVA AHFTTARRLL LFPPPPPPVV VTTTDAAEEE EAAFFFAPPP YYKFNQILAA
             SSEEPP--VA AYFTEANNLL LSPPPPPPAA ASSSSSSTTT TDDLLLAPPP YYKLNQILAA
             SSEEPP--VA AYFTEANNLL LSPPPPPPAA ASSSSSSTTT TDDLLLAPPP YYKLNQILAA

             AHGHVVIDDF FSMMGGQWQQ LLAALLRPPG PPF------- LRRITTGIII IPPPSPT---
             AEKKIIVDDF FGVVGGQWQQ LLAATTRTTS KPTQQQQQQQ IRRVSSGIII IAPPSLGPPP
             AEKKIIVDDF FGVVGGQWQQ LLAATTRTTS KPTQQQQQQQ IRRVSSGIII IAPPSLGPPP

             --DDVVGLLL AALRRSRRRF FFRVAAANSS LLDDVVVRRW MLLIIIGGEA FFFSSVVVLL
             SSAATTGNNL RRFKKVDDNF FFIILLT-PP IIHHLLLNNS SFFVVVDDEV VVVFFMMMLL
             SSAATTGNNL RRFKKVDDNF FFIILLT-PP IIHHLLLNNS SFFVVVDDEV VVVFFMMMLL

             LLRLLGGDDP PAADDAIIIL LLVVVFTTVV IIEEQAADDK TGGFFLLRFE EEELYYYSSS
             LLKLL----- ---DDTVVVL LLAALVTTLL GGEEYVVSSR VGGFFAARVN NNNLFFYSSS
             LLKLL----- ---DDTVVVL LLAALVTTLL GGEEYVVSSR VGGFFAARVN NNNLFFYSSS

             AVVFDDSLDA AAASSSGGAA AMMAEYYYLR EICCDDDIIV GA-RRERHEE PSSSRDLRAG
             AVVFEESLEP PNNLRRSSEV VRRVELLLFR RISSGGGLLI PGIRRERMEE EEEEQVMNAG
             AVVFEESLEP PNNLRRSSEV VRRVELLLFR RISSGGGLLI PGIRRERMEE EEEEQVMNAG

             SSSSAAVVPN NRARMMMLVL FFSSSG---S VVVEEDTLGH HGPLLLFSSA WEEAGDDDDG
             EEEESSVVKY YSAKIIILLN YYNNNNLLYI VVVESPSLAN NDPLLLLTTL WRR-------
             EEEESSVVKY YSAKIIILLN YYNNNNLLYI VVVESPSLAN NDPLLLLTTL WRR-------

             GGDDDNNNSS NNVVSSGGNS SGSSSNKKSS SAGSCLLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --MDDFPQQQ ----WWWPPM PASSLLAAGG
t5g66770.1   MSSNLLMAIA VVIKKQQQQQ QEEQQQQQHQ DDHQQFGNNN NNNPWWWPP- TLGFLLGGSS
t5g66770.2   MSSNLLMAIA VVIKKQQQQQ QEEQQQQQHQ DDHQQFGNNN NNNPWWWPP- TLGFLLGGSS

             FPPPAAAVV- ---------- -------PPP DDDDDGYY-- ---------- ----------
             APPDQQQVVT GGGDSDDPGG PPNNLHHTTT TTTTTGFFRR LSDDFGGTGG EEFDMMMELL
             APPPQQQVVT GGGDSDDPGG PPNNLHHTTT TTTTTGFFRR LSDDFGGTGG EEFDMMMELL

             ---------- ---------- ---DPPPPA- -----AA--- ---------- DDAAAAAAPE
             LIISSGDVAD GPPPDCCTTT TWWDNPPPDY VIIIYPPTYS SLSVPSSLLN DDTSSSSSLP
             LIISSGDVAD GPPPDCCTTT TWWDNPPPDY VIIIYPPTYS SLSVPSSLLN DDTSSSSSLP

             EEEEAPPPPC CAPAAAAV-- --EEEEEEEV AAGIR----- ---------- ----LLLLLM
             PPPPLPPPPS SSPSIPPLHE EEDDDPPEET NNDSEDDDDD DDDDDDDDDF DLPPLLAAIY
             PPPPLPPPPS SSPSIPPLHE EEDDDPPEET NNDSEDDDDD DDDDDDDDDF DLPPLLAAIY

             MGAAAAGHAL AQQLAAADSH LLAAVSAASG RRVAAVHFFF TAAALLRRRL LLF--SVVAP
             Y-RRDDSPNE ATTLLLLQIR VVSELGDPT- RRVAAFYFFF EAAALLNNRL LLSNNSAATS
             Y-RRDDSPNE ATTLLLLQIR VVSELGDPT- RRVAAFYFFF EAAALLNNRL LLSNNSAATS

             PEHLL--YHH FFYYEAPPPY LNQQQQAILE EHHCCDHHVI DDFLQLPPAA LALRPPPPLR
             STEIILSYKT LLNNDAPPPY SNQQQQAILE EEESSNKHIV DDFIQIPPAA LATRTTTPIR
             STEIILSYKT LLNNDAPPPY SNQQQQAILE EEESSNKHIV DDFIQIPPAA LATRTTTPIR

             GGGIIGPPPP PPPSPTTTGG DDDE-DRLLL LAARVRGGVV VAAANSDVRR RRRRWLIAPP
             GGGIIPAAAP PPPSLGGGEE PPPESARLLF FAAKLNPPII ILLL-PHLNN NNNNSFVDPP
             GGGIIPAAAP PPPSLGGGEE PPPESARLLF FAAKLNPPII ILLL-PHLNN NNNNSFVDPP

             AVNSSVLLLL LRLLLGDDAA DDDQAAP--- -VVDVASSVR PPIIFFTIIE EEADNKTGLL
             VLNFFMLLLL LKLLL----- DDDETTPTTT IAARAKSSLN PPVVVVTGGE EEVSNRVGAA
             VLNFFMLLLL LKLLL----- DDDETTPTTT IAARAKSSLN PPVVVVTGGE EEVSNRVGAA

             RTLYYYSSSD SAAASAGGGN NAME-AYYLL RCDIIGAA-R RRREEEERHH HPLSSSSRDR
             RKLFYYSSSE SPNNLGDSER RVRERELLFF RSGLLKTTIH HHREEEERMM MEKEEEERVL
             RKLFYYSSSE SPNNLGDSER RVRERELLFF RSGLLKTTIH HHREEEERMM MEKEEEERVL

             LLLRAAASAA VVGSLLRRRR RMVGGGLFEG SVEEAALLTT TLGHHGGPPP LSSSASSAAE
             MMMNAAAESS VVSNVVSSSK KILWWWNYSN IVEEKKIISS SLANNDDPPP LTTTLSSSSR
             MMMNAAAESS VVSNVVSSSK KILWWWNYSN IVEEKKIISS SLANNDDPPP LTTTLSSSSR

             GGGGGGDDDN SNNSSSSGGD SNNSGGGGGK GRSSSSVVC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ------MMMD TPQ----MMM
t5g66770.1   AAYYMMMCTD DSGGNLLLMM AIQQVIKKKQ KKQEQQQQQQ QQHQDDHHHQ IGNSLLP---
t5g66770.2   AAYYMMMCTD DSGGNLLLMM AIQQVIKKKK KKQEQQQQQQ QQHQDDHHHQ IGNSLLP---

             AAASSSSSGD AFLLVV---- ---------- -------PDG VVGGY----- ----------
             SLLGGFFFGS GAFFVVDDSN DDDPFFPPPL LLLDHHHTTG GGGGFLLGGG GGGGGGEEEW
             SLLGGFFFGS GAFFVVDDSN DDDPFFPPPL LLLDHHHTTG GGGGFLLGGG GGGGGGEEEW

             ---------- ---------- ------DDDP PA-------- -GAD------ ----------
             MELLSSSGGG GGSAADGPPD CCDTWWDDDP PDYYYVVIII IGPDPFDDTT YYLSSDLLRV
             MELLSSSGGG GGSAADGPPD CCDTWWDDDP PDYYYVVIII IGPDPFDDTT YYLSSDLLRV

             VVVDAAAEEE FAAFFPPCAP DDAAALLL-A EEEEEEAGGG GII------- --VHHHLLSC
             IIIDSSSPPP PLLWWPPSSP LLPPPTTTES EEDDDPNDDD DSSDDDFLEE EPLKKKAIDC
             IIIDSSSPPP PLLWWPPSSP LLPPPTTTES EEDDDPNDDD DSSDDDFLEE EPLKKKAIDC

             AAAAAEEEEA GDDLASAAAS HHAAALLLAA AVSAAASGII RVVAVHFTAA LSL-PVPTAE
             ARRRRSSSSD SDDEASKKLI RREESVVVSS ELGDDPT--- RVVAFYFEAA LSLNPASSST
             ARRRRSSSSD SDDEASKKLI RREESVVVSS ELGDDPT--- RVVAFYFEAA LSLNPASSST

             HAAFL-HHHA CYYKKFFFTA ANNQQAAAIL AAAFHHGGDH HHVDFFSSSS MMQGGLLAAL
             EDDLISKTTA CYYKKFFFTA ANNQQAAAIL AAATEEKKNK KKIDFFGGGG VVQGGIIAAL
             EDDLISKTTA CYYKKFFFTA ANNQQAAAIL AAATEEKKNK KKIDFFGGGG VVQGGIIAAL

             IQLLLALPGG GPP-IITTGG IGPSPTTTGR RE--LRDDLA AADDLLAASV RVSRGNSLEP
             LQLLLATTSS GPPQVVSSGG IPPSLGGGES SEPSLIAALR RRDDFFAAVL DLDIP-PILG
             LQLLLATTSS GPPQVVSSGG IPPSLGGGES SEPSLIAALR RRDDFFAAVL DLDIP-PILG

             MMQQIIAAEA VAFNSSLLQH RLGDDPAAPP --DAVLDCVA SSVRKITTTT VIEEEAHNKT
             SSRRVVDDEV LAVNFFLLQY KL------PP TIDTALRLAK SSLNRVTTTT LGEEEVLNRV
             SSRRVVDDEV LAVNFFLLQY KL------PP TIDTALRLAK SSLNRVTTTT LGEEEVLNRV

             GFLLDDFAFF YAFFFAAASS GGGGGAAE-Y YYQQRRRRRE CIIICGAAAA -RRRRERRHE
             GFAANNVAQQ YAFFFPPNLL DDDDSVVERL LLGGRRRRRR SLLLGPTTGG IHHRRERRME
             GFAANNVAQQ YAFFFPPNLL DDDDSVVERL LLGGRRRRRR SLLLGPTTGG IHHRRERRME

             EPPPLRWWRR RTTRAAAALS SPLNAALLRQ AAMLVLLLFS GGGEG--VEA AGGCLLGGHG
             EEEEKQWWRR LEENAAAAFE EKLYAAVVSQ AAILLNNNYN YYYSNYYVEK KGGFILAAND
             EEEEKQWWRR LEENAAAAFE EKLYAAVVSQ AAILLNNNYN YYYSNYYVEK KGGFILAAND

             PLFSSAAWAA GGDNNNNNNV VDDNGSSNNS SGGVVCLLL
             PLLTTSSW-- ---------- ---------- ---------
             PLLTTSSW-- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------M MPPPFFQ--- -----PDAAS
t5g66770.1   MMAAYYMCDG NMAAAIIAQQ QVIIIQKQQQ QQHHHHDDDH HGGGIINPPP SLLNPPNSLG
t5g66770.2   MMAAYYMCDG NMAAAIIAQQ QVIIIKKQQQ QQHHHHDDDH HGGGIINPPP SLLNPPNSLG

             SSSAAAAGFF LLPPPPAV-- ---------- ----ADVVVG Y--------- ----------
             GFFGGGGSAA FFPPPPFVDS SDGGGGPPND HHHHATGGGG FRLLLLSDDD GEEESMELII
             GFFGGGGSAA FFPPPPFVDS SDGGGGPPND HHHHATGGGG FRLLLLSDDD GEEESMELII

             ---------- --------DD DDP------- ---------- ---VDAPFFF AAAACCCPDD
             SSSSGADGGD DDCDDDWWDD DDNVYYYYPP DTTRSSSSSL NVVIDSLPPP PPTLSSSPLL
             SSSSGADGGD DDCDDDWWDD DDNVYYYYPP DTTRSSSSSL NVVIDSLPPP PPTLSSSPLL

             AVL--AAARE EVVAGIIIIR R--------- LVVHHLLLMM CCAGGIEGAL AAASSSSQQL
             PLTHESSSKE ETTNDSSSSE EDDDDLLEEP LLLKKAIIYY CCA--ISSNE AAASSSSTTL
             PLTHESSSKE ETTNDSSSSE EDDDDLLEEP LLLKKAIIYY CCA--ISSNE AAASSSSTTL

             AADSHLLAAA VSASSGIGRF FFFTTLSSRR RRLLLFPPSP VVVAPTDEA- --HHFYYYEY
             LLQIRVVSSE LGDTT--ERF FFFEELSSNN NRLLLSPPSP AAATSSSTDL LLKKLNNNDY
             LLQIRVVSSE LGDTT--ERF FFFEELSSNN NRLLLSPPSP AAATSSSTDL LLKKLNNNDY

             LKFFFFFANN QAIILEEEAF FGGGHVVVVV DFSLMQLWPP PALIIQQQLR RRGP--TGGP
             SKFFLLLANN QAIILEEEAT TKKKKIIIII DFGIVQIWPP PALLLQQQLR RRSPQQSGPA
             SKFFLLLANN QAIILEEEAT TKKKKIIIII DFGIVQIWPP PALLLQQQLR RRSPQQSGPA

             SSTTRRRRDD E-LRRDVLLL LLAADAARSS SSRVFSFFVV NNNEVPPPLQ IAVAFNNSQQ
             SSGGSSSSPP ESLIIATNNN NNRRDAAKVV VVDLFDFFII ---LLGGGFR VVLAVNNFQQ
             SSGGSSSSPP ESLIIATNNN NNRRDAAKVV VVDLFDFFII ---LLGGGFR VVLAVNNFQQ

             HHHGGDP--A VDDVAAAVRK KITVIIIEEQ QEEAAHNNNK KGLDTTLLFF YSLDAAAAGG
             YYY----TIT ARRAKKKLNR RVTLGGGEEY YEEVVLNNNR RGANKKLLQQ FSLEPNNGDS
             YYY----TIT ARRAKKKLNR RVTLGGGEEY YEEVVLNNNR RGANKKLLQQ FSLEPNNGDS

             AAGGMAEAEI IICIICCGAA AA-----RRR RELLSSWRDD LLTTTRRAGL AVVPPPLGSN
             EEEERVEERI IISLLGGKTG GGIIIIIRRR REKKEEWRVV MMEEENNAGF SVVKKKLSNY
             EEEERVEERI IISLLGGKTG GGIIIIIRRR REKKEEWRVV MMEEENNAGF SVVKKKLSNY

             AALRAAMLGG LLG-SSVEEA DGLLLTLGGR RPLLFFSSAA SSAWEAAAAA AGGGGGGGGG
             AAVSAAILWW NNNLIIVSSK PGIIISLAAL LPLLLLTTLL SSSWR----- ----------
             AAVSAAILWW NNNLIIVSSK PGIIISLAAL LPLLLLTTLL SSSWR----- ----------

             GGDNNSSNNN GSSSGSDSSG SSGSSSSSGG RDDDSVCCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --MDDTTFQ- -----PMDPA AAASSSGGDA
t5g66770.1   AAYMTNLLIA AAAQVQQQQQ QEEQQQQQHH HHHQQIIFNP LSNNPP-NTS LLLFFFGGSG
t5g66770.2   AAYMTNLLIA AAAQVKKKQQ QEEQQQQQHH HHHQQIIFNP LSNNPP-NTS LLLFFFGGSG

             GPPPPAAA-- ---------- ---------A APPPPDDG-- ---------- ----------
             SDPPPFQQGG GGGGDDSNFF FPNNNDDHHA ATTTTTTGRR RLSFFFGGGG GFFFEESWMM
             SPPPPFQQGG GGGGDDSNFF FPNNNDDHHA ATTTTTTGRR RLSFFFGGGG GFFFEESWMM

             ---------- DDDDPPA--- ---------- ---------- VVDAAAAAPE AAAAAFPPPP
             ETLGGVVGPW DDDDNPDYIY YPDPPLLSVQ SDDLNNNRVV IIDTTSSSLP PPPTTWPPPP
             ETLGGVVGPW DDDDNPDYIY YPDPPLLSVQ SDDLNNNRVV IIDTTSSSLP PPPTTWPPPP

             PPPPPAAAAD AAAAL-AMRE EEVIIRRR-- ---VVVHLLM MSCCAAIEEE DHAAQLLAAD
             PPPPPSSSSL SSIPTHSPTE EPTSSEEEDD LPPLLLKAAY YDCCAAISSS DPNNTLLLLQ
             PPPPPSSSSL SSIPTHSPTE EPTSSEEEDD LPPLLLKAAY YDCCAAISSS DPNNTLLLLQ

             SAAALLAVSA AASSGIGRRR VVVHTTTAAS SRRFP-PVAA AAPPPPPPDA EHHAFL--FF
             IESSVVELGP PPTT--ERRR VVFYEEEAAS SNRSPNPATT TTSSSSSSSS TEEDLILSLL
             IESSVVELGP PPTT--ERRR VVFYEEEAAS SNRSPNPATT TTSSSSSSSS TEEDLILSLL

             YECCPYYLLL FAHFFQAILA FGHHHIDDSQ GWWALAALLR GGPF-RIIIT GGGGGIGPPP
             NDCCPYYSSS FAHLLQAILA TKKHHVDDGQ GWWALAATTR SGPTQRVVVS GGGGGIPAPL
             NDCCPYYSSS FAHLLQAILA TKKHHVDDGQ GWWALAATTR SGPTQRVVVS GGGGGIPAPL

             TTG---LLRV RLRVVFSSSG GGGVAANLLD DEVVPWLLQQ IAGEAAFNSS VLQLLLLGDP
             GGEPSSLLIT RFKLLFDDDP PPPILT-IIH HLLLGSFFRR VDDEAAVNFF MLQLLLL---
             GGEPSSLLIT RFKLLFDDDP PPPILT-IIH HLLLGSFFRR VDDEAAVNFF MLQLLLL---

             AAD---DAAV VDDVVVVAAS RRIIIFTVVV IEEEAHHHKK TTTFFLLLDR RTTAAFAAVS
             --DTTIDTTA ARRAAAAKKS NNVVVVTLLL GEEEVLLLRR VVVFFAAANR RKKAAQAAVS
             --DTTIDTTA ARRAAAAKKS NNVVVVTLLL GEEEVLLLRR VVVFFAAANR RKKAAQAAVS

             SSLASSSSGG GAGGNAEAYL LQRRIICVVC CCEEEGAA-- REEREPPSRR WRDTRRGSSA
             SSLNLRRRSS SEEERVEELF FGRRIISIIG GGEEEKGGII REEREEEEQQ WRVENNGEES
             SSLNLRRRSS SEEERVEELF FGRRIISIIG GGEEEKGGII REEREEEEQQ WRVENNGEES

             ALGNNALRRQ AMMMMLVGLF SEEEG---SS EGGLLLGGHG PPPLLFFWEA AAAGDGGGGN
             SLSYYAVSSQ AIIIILLWNY NSSSNLYYII EGGIIIAAND PPPLLLLWR- ----------
             SLSYYAVSSQ AIIIILLWNY NSSSNLYYII EGGIIIAAND PPPLLLLWR- ----------

             SSSSNVGGGG SSSSGSSDDS NNSSSSKSSS SSAARDSVC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -----MDTFP QQQ-----MD DAAASSSGLG
t5g66770.1   AAYYYMMCCC CTGGLLLMAQ QIKQKKQQQQ QQQQQHQIFG NNNLLSLP-N NSSLGGFGLS
t5g66770.2   AAYYYMMCCC CTGGLLLMAQ QIKKKKQQQQ QQQQQHQIFG NNNLLSLP-N NSSLGGFGLS

             GFLPPPAVVV ---------- ---------A AADDGG---- ---------- ----------
             SAFDDPFVVV TGGGGDNGFP FFPPNNDHHA AATTGGLSSD DDFGGGGGEE EFEEEESSSD
             SAFPPPFVVV TGGGGDNGFP FFPPNNDHHA AATTGGLSSD DDFGGGGGEE EFEEEESSSD

             ---------- ---------- --------P- --GGAD---- ---------- -DDAAAFFPP
             DDEEWTLLLI ISSSDDGGGG PPDDTTTTNV IYGGPDPPFD DTPRRLQQSD VDDSSSWWPP
             DDEEWTLLLI ISSSDDGGGG PPDDTTTTNV IYGGPDPPFD DTPRRLQQSD VDDSSSWWPP

             AAAAAAVVVL ----AAMMRR EEEEEGIR-- --------LL HHHLLMMSAA EAALSAAHAA
             SSSSIPLLLT HHEESSPPTK EPPPEDSEDF DDDLLEPPLL KKKAAYYDRR SDNESKLRES
             SSSSIPLLLT HHEESSPPTK EPPPEDSEDF DDDLLEPPLL KKKAAYYDRR SDNESKLRES

             AAASASSIIG GVVVFTTALS LLLFSPVVVA PPPPPTTTDH FLL---YFYA LKFFAAHHFF
             SSSGPTT--E EVFFFTEALS LLLSSPAAAT SSSSSSSSSE LIILLSYLNA SKFFAAHHLL
             SSSGPTT--E EVFFFTEALS LLLSSPAAAT SSSSSSSSSE LIILLSYLNA SKFFAAHHLL

             TNAIIEAAFF FHHGCHHHVI DSSLLMQQGP ALIILALPPG PPPLIGGGPP SSSGDDDEER
             TNAIIEAATT TEEKSHHHIV DGGIIVQQGP ALLLLATTTS KPPIVGGPAP SSSEPPPEEI
             TNAIIEAATT TEEKSHHHIV DGGIIVQQGP ALLLLATTTS KPPIVGGPAP SSSEPPPEEI

             RRDDVVVRRA RVRFFFFRGG GGVVAADEVR WMMAAPEEAA ANNNSVLLLL HHLDPADQP-
             IIAATTTRRR KLNFFFFIPP PPIILTHLLN SSSDDPEEVV ANNNFMLLLL YYL---DEPI
             IIAATTTRRR KLNFFFFIPP PPIILTHLLN SSSDDPEEVV ANNNFMLLLL YYL---DEPI

             -IDVVLLLLL DVRITTTVVI QQEEADDDHN KTTFLLRFTA AFFYYYFFSS DDAAAAASGG
             IVDAALLLLL RANVTTTLLG YYEEVSSSLN RVVFAARVKA AQQFFFFFSS EEPPPNGRDS
             IVDAALLLLL RANVTTTLLG YYEEVSSSLN RVVFAARVKA AQQFFFFFSS EEPPPNGRDS

             GGAGNNAAAM M---AYLQRR IVCEGGA-RR EEEEEPPPLL LSSSWRDRLL LTRAGLSAGS
             SSEERRVVVR RRRRELFGRR IIGEKKTIHH EEEEEEEEKK KEEEWRVLMM MENAGFESSN
             SSEERRVVVR RRRRELFGRR IIGEKKTIHH EEEEEEEEKK KEEEWRVLMM MENAGFESSN

             SSNAARRQAR MMMVVGFFGG ------HHVV AADDGLTLLG GGWHHRPLLF SAAAAWEAGG
             NNYAASSQAK IIILLWYYYY LLLLYYSSVV KKPPGISLLA AAWNNLPLLL TLSSSWR---
             NNYAASSQAK IIILLWYYYY LLLLYYSSVV KKPPGISLLA AAWNNLPLLL TLSSSWR---

             GNNNSNGGGS SSSDDDDSNN GSSSKSSDDG GGGSSSCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----MDF--- ---WAAAALL LDDDAGFPPP
t5g66770.1   YYTMIAAQVV VVKQEEEQQQ QQQQQHHHHH HHQQHQIPPL NPPWSSSLLL LSSSGSAPDP
t5g66770.2   YYTMIAAQVV VVKQEEEQQQ QQQQQHHHHH HHQQHQIPPL NPPWSSSLLL LSSSGSAPPP

             AVV------- ---------- ---------A PDGVV----- ---------- ----------
             QVVTGGGGGG SNNDDPPFFF PDDDHHHHHA TTGGGLLLSS FGGGGGGGEF FEESEETTTL
             QVVTGGGGGG SNNDDPPFFF PDDDHHHHHA TTGGGLLLSS FGGGGGGGEF FEESEETTTL

             ---------- ---------- ------YDDP A-DD------ ---------- ----VDDDAA
             IIIGGGGGGD DVADPPPDDC CDDWWWHDDP DIDDPYPPSR LSSVVSDLLN NNVVIDDDTS
             IIIGGGGGGD DVADPPPDDC CDDWWWHDDP DIDDPYPPSR LSSVVSDLLN NNVVIDDDTS

             LLPPEAAPPP PCAPDDAAAV VVL-AMMMEE EAGI------ ---------L VVHLLSCAGI
             PPLLPTLPPP PSSPLLSPPL LLTHSPPPED PNDSDDDDDF FDLEPPPPPL LLKAADCA-I
             PPLLPTLPPP PSSPLLSPPL LLTHSPPPED PNDSDDDDDF FDLEPPPPPL LLKAADCA-I

             EAGDDDHAAA AAADSHAALA AASSGGIIIG GRVVVVVVHF FTTTLSSRRL LLLFPSSPPP
             SDSDDDPNAK LLLQIRESVS EEGT-----E ERVVFFFFYF FTEELSSRRL LLLSPSSSSS
             SDSDDDPNAK LLLQIRESVS EEGT-----E ERVVFFFFYF FTEELSSRRL LLLSPSSSSS

             PTDAAAAEEH AFLL-YHHHH AACCKKFFTL EEEAHHDVVH VDFLLMMQQQ GLAALLLLAL
             SSSSSSSTTE DLIISYKKKT AACCKKLLTL EEEAEENIIH IDFIIVVQQQ GIAALLLLAT
             SSSSSSSTTE DLIISYKKKT AACCKKLLTL EEEAEENIIH IDFIIVVQQQ GIAALLLLAT

             RGGGGPRITT IIIIGPSPGR RDEEE-LRRD DVRALLLARV VRVFFSGGAA ANNNSSDDEV
             RSGGGKRVSS IIIIPASLES SPEEEPLIIA ATRRFFFAKL LDLFFDPPLT T---PPHHLL
             RSGGGKRVSS IIIIPASLES SPEEEPLIIA ATRRFFFAKL LDLFFDPPLT T---PPHHLL

             VVRPPWMMLL LQIASSVLLR RLGGDDPPP- --IIDAVLDV VAAAASVVVR PKTVVIIEEE
             LLNGGSSSFF FRVAFFMLLK KL------PT TTVVDTALRA AKKKKSLLLN PRTLLGGEEE
             LLNGGSSSFF FRVAFFMLLK KL------PT TTVVDTALRA AKKKKSLLLN PRTLLGGEEE

             QEAAADDDNK TFFFFLFTAL FFFYSAAFDD LASSASGGGG GNAAMMYYLD IVVGGAAAAR
             YEVVVSSSNR VFFFFAVKAL QQQFSAAFEE LPLLGRDDDS SRVVRRLLFG LIIKKTGGGR
             YEVVVSSSNR VFFFFAVKAL QQQFSAAFEE LPLLGRDDDS SRVVRRLLFG LIIKKTGGGR

             EPDRRTRRAL AAVVPPALRR QQQQLLVVFF FGG--HSEEA DLLLLLLLLG GGGRLSSSAW
             EEVLLENNAF SSVVKKAVSS QQQQLLLLYY YYNLLSIEEK PIIIILLLLA ADDLLTTSSW
             EEVLLENNAF SSVVKKAVSS QQQQLLLLYY YYNLLSIEEK PIIIILLLLA ADDLLTTSSW

             AAAGDGNNNS SNNNSSGGGS SSSNSGSGGS ARRDDGSLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------M MMDDTFQ--W MPAASSSSPP
t5g66770.1   AAYYMMMTTS SGNMAAAIAQ VVVKKQQQQQ QQHHQQQQQH HHQQIFNSPW -TSLGFFFPP
t5g66770.2   AAYYMMMTTS SGNMAAAIAQ VVVKKKKQQQ QQHHQQQQQH HHQQIFNSPW -TSLGFFFPP

             PPAAAVV--- ---------- ---------- --ADVGGYY- ---------- ----------
             PPFQQVVTTG GGGSNDDPPG GFPFFFNNLD HHATGGGFFR SSFGGGGEEE FFEWMMMEEL
             PPFQQVVTTG GGGSNDDPPG GFPFFFNNLD HHATGGGFFR SSFGGGGEEE FFEWMMMEEL

             ---------- --------YP AA----DD-- ---------- ---VDPPEAA AAFFFPPPDD
             ISGGGGDDDD DDSADPCWHP DDYVIIDDFF TTYPRRLVVV VSVIDLLPTT LLWWWPPPLL
             ISGGGGDDDD DDSADPCWHP DDYVIIDDFF TTYPRRLVVV VSVIDLLPTT LLWWWPPPLL

             AAAVLLAARR EEAGIRRR-- ---------- -LVLMMAAGA IEEEAGGGGH ALASHALLLA
             SIPLTTSSKK DENDSEEEDD DDDFFFDDLL ELLAYYAA-R ISSSDSSSSP NELIREVVVS
             SIPLTTSSKK DENDSEEEDD DDDFFFDDLL ELLAYYAA-R ISSSDSSSSP NELIREVVVS

             AVAAAGIIGV HTTTTLSRRL LLFSVVVAPP TTTDDDHHAA FFLL---YYY HHFECCCCYY
             SLDDP---EF YTTTELSNRL LLSSAAATSS SSSSSSEEDD LLIILLLYYY KTLDCCCCYY
             SLDDP---EF YTTTELSNRL LLSSAAATSS SSSSSSEEDD LLIILLLYYY KTLDCCCCYY

             LKKKKAHFTI EHDDHHHHVH VDFSLQGGLL WPAAALLALL PPGGPPPP-- RGGGIIPPSP
             SKKKKAHLTI EENNKKKKIH IDFGIQGGII WPAAALLATT TTSGKPPPQQ RGGGIIAPSL
             SKKKKAHLTI EENNKKKKIH IDFGIQGGII WPAAALLATT TTSGKPPPQQ RGGGIIAPSL

             GGE---LRLD ARRSVFSRVA NNNLLDEEEV VVRRWMLLQQ APEVVFNNSV VLLLQLRLLL
             EEEPPPLIND AKKVLFDIIL ---IIHLLLL LLNNSSFFRR DPELLVNNFM MLLLQLKLLL
             EEEPPPLIND AKKVLFDIIL ---IIHLLLL LLNNSSFFRR DPELLVNNFM MLLLQLKLLL

             GADDQ--IAV LDVVSRIIII FTTVEEAADD DNKTGFFLDF FTTELLLYYF FFDDDDSLLD
             --DDETIVTA LRAASNVVVV VTTLEEVVSS SNRVGFFANV VKKNLLLFFF FFEEEESLLE
             --DDETIVTA LRAASNVVVV VTTLEEVVSS SNRVGFFANV VKKNLLLFFF FFEEEESLLE

             AASSAAGGGN AAMMEAYLLL QQQEEECDIV GEAAAAA--R RRHHEEPPLW WDDRRRRAGA
             PNLLGGSSSR VVRREELFFF GGGRRRSGLI PETTTTTIIH RRMMEEEEKW WVVLLLNAGS
             PNLLGGSSSR VVRREELFFF GGGRRRSGLI PETTTTTIIH RRMMEEEEKW WVVLLLNAGS

             AVPPPLLGSN NRRQRMMFFS GGEG--HSVE EAGCLTGGRP LFFSAAAAWA AGGGDGGGDD
             SVKKKLLSNY YSSQKIIYYN YYSNLYSIVS SKGFISAALP LLLTLLLSW- ----------
             SVKKKLLSNY YSSQKIIYYN YYSNLYSIVS SKGFISAALP LLLTLLLSW- ----------

             NNSSSNNSGG SSSDDSSNNS SNNNGGGGKK SGGSSCCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---MDDDTFP PFFF------ ---WMMPASS LDAAGGFFLL
t5g66770.1   MAYCCCSSNL AAIIAVQKKQ EQDHQQQIFG GIIILLSNNP PPPW--TSFF LSGGSSAAFF
t5g66770.2   MAYCCCSSNL AAIIAVKKKQ EQDHQQQIFG GIIILLSNNP PPPW--TSFF LSGGSSAAFF

             PPPPAAAVVV ---------- -----AAPVG Y--------- ---------- ----------
             PPPPFQQVVV TTGGGDDDDP GFNLDAATGG FLSDDFTTGG GGGGGFEEDD DEEEEMEEES
             PPPPFQQVVV TTGGGDDDDP GFNLDAATGG FLSDDFTTGG GGGGGFEEDD DEEEEMEEES

             ---------- --YPPA--GA AD-------- ---------- -VVVAAALLA FCCAAPPPAA
             GGDVADGDDC CTHNPDYVGP PDPFDDPPPS LLSSVVSDDL NIIITSSPPL WSSSSPPPSP
             GGDVADGDDC CTHNPDYVGP PDPFDDPPPS LLSSVVSDDL NIIITSSPPL WSSSSPPPSP

             AVVAAMEEEE EEVGGGII-- -------LLL MSSSCAIEAA DDDDHASSAQ SHAAAVVAAS
             PLLSSPEDDP PETDDDSSDD DFFDEEPLLI YDDDCAISDD DDDDPNSSKT IRSSELLDPT
             PLLSSPEDDP PETDDDSSDD DFFDEEPLLI YDDDCAISDD DDDDPNSSKT IRSSELLDPT

             GIGGGAAVHH FFTASRRFSV VAPTDDAL-- -YFFECCLKK AFAAAANNQQ IIEEHGGGCD
             --EEEAAFYY FFTASNRSSA ATSSSSDILS SYLLDCCSKK ALAAAANNQQ IIEEEKKKSN
             --EEEAAFYY FFTASNRSSA ATSSSSDILS SYLLDCCSKK ALAAAANNQQ IIEEEKKKSN

             DDHVVIDFSS SSLLMMQGGG LAQQALALPG P---LIIGGP PPPPTTRREE --RVGLAAAA
             NNHIIVDFGG GGIIVVQGGG IAQQALATTS PQQQIVVGPA PLLLGGSSEE PSITGNRRRR
             NNHIIVDFGG GGIIVVQGGG IAQQALATTS PQQQIVVGPA PLLLGGSSEE PSITGNRRRR

             ADDDDAAVRF FFRGANNNSD DERRPLQQIA AAPPAVVAFF NQHHLLGDDD DAD---IIDA
             RDDDDAALNF FFIPL---PH HLNNGFRRVD DDPPVLLAVV NQYYLL---- --DTTIVVDT
             RDDDDAALNF FFIPL---PH HLNNGFRRVD DDPPVLLAVV NQYYLL---- --DTTIVVDT

             AVVVLDCCCV SVRPFTIIIQ DDHHKTTDRF TEFFYYYYAA DSSSSSAGGG NE--AYYYLR
             TAAALRLLLA SLNPVTGGGY SSLLRVVNRV KNQQFFFYAA ESSLLLGDSE RERRELLLFR
             TAAALRLLLA SLNPVTGGGY SSLLRVVNRV KNQQFFFYAA ESSLLLGDSE RERRELLLFR

             EEIICIVVGG GA--RRHEEP LLLRRRRRRD DRLTAAGSSA VVPLGSSALR QAARMVLG--
             RRIISLIIPK KTIIRRMEEE KKKQQQRRRV VLMEAAGEES VVKLSNNAVS QAAKILNNLL
             RRIISLIIPK KTIIRRMEEE KKKQQQRRRV VLMEAAGEES VVKLSNNAVS QAAKILNNLL

             ----HHHHHS VVEADCLLTT TLGGWHHHHG GGPLLLSSAA WWWWEAAAGD DDDDGGGGGG
             LLYYSSSSSI VVEKPFIISS SLAAWNNNND DDPLLLTTSS WWWWR----- ----------
             LLYYSSSSSI VVEKPFIISS SLAAWNNNND DDPLLLTTSS WWWWR----- ----------

             GGNNNNNSSS SSSNSSGSSS DSSGGKSSGG DDDDSSSLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -----MDDTF FFQQQ--PPA
t5g66770.1   MMYYMCCCCC TTTGNLMMMA QIIKQQKKKQ QQQEEQQQQQ QQQDDHQQIF IINNNLNPTS
t5g66770.2   MMYYMCCCCC TTTGNLMMMA QIIKKKKKKQ QQQEEQQQQQ QQQDDHQQIF IINNNLNPTS

             AAAASGDAAA GFLPPPPPAA A--------- ---------- PDDDGG---- ----------
             SLLLFGSGGG SAFDDDDPFQ QTTGGGGDSD GGPNDDHHHH TTTTGGFGGT GGGEFEWELL
             SLLLFGSGGG SAFPPPPPFQ QTTGGGGDSD GGPNDDHHHH TTTTGGFGGT GGGEFEWELL

             ---------- ---------- -----YYPPP PPAA-----G AA-------- ----------
             IGGGGDVVVA DDDDGPDDCD DTTTWHHNNN NPDDYIIYYG PPFFDDTTPP RLLQQQPLLL
             IGGGGDVVVA DDDDGPDDCD DTTTWHHNNN NPDDYIIYYG PPFFDDTTPP RLLQQQPLLL

             --VDDAAPPE AAAAAAAPAP AAAAVLLLL- --RREERR-- ----LLVHHL LLAAAGAEAA
             NNIDDSSLLP PTTTLLLPSP PPPPLTTTTH EETTEEEEDD DFEPLLLKKI IIAAA-RSDD
             NNIDDSSLLP PTTTLLLPSP PPPPLTTTTH EETTEEEEDD DFEPLLLKKI IIAAA-RSDD

             GHLLLLQQAA DSSHHALAAS SSAAASGGGR RVVVAVHFTA LLRLFFPPVV AATAAAAFF-
             SPEEEETTLL QIIRREVSEG GGDPPT---R RVVVAFYFEA LLRLSSPPAA TTSSDDDLLS
             SPEEEETTLL QIIRREVSEG GGDPPT---R RVVVAFYFEA LLRLSSPPAA TTSSDDDLLS

             --HHEEAACP YLAHFFTAAN NAAEGGHHHV VDFSSLLQPL QQALLLLGGP PPF--LRIIG
             SSKTDDAACP YSAHLLTAAN NAAEKKKKHI IDFGGIIQPL QQALLTTSGK PPTQQIRVVG
             SSKTDDAACP YSAHLLTAAN NAAEKKKKHI IDFGGIIQPL QQALLTTSGK PPTQQIRVVG

             GIIIPSSSPT RDDE---LDG GLRLAADLAR RSVRRRRFFS FFRGAANLLV PPPMLQIIAA
             GIIIPSSSLG SPPEPPSLAG GNRLRRDFAK KVLDDNNFFD FFIPLL-IIL GGGSFRVVDD
             GIIIPSSSLG SPPEPPSLAG GNRLRRDFAK KVLDDNNFFD FFIPLL-IIL GGGSFRVVDD

             PPGEFFSVRR LLLAAAADAP PP--IAVLDC VVVASVVRRI FVIIEEQQEE EEAAFLDRFE
             PPDEVVFMKK LLL----DTP PPTIVTALRL AAAKSLLNNV VLGGEEYYEE EEVVFANRVN
             PPDEVVFMKK LLL----DTP PPTIVTALRL AAAKSLLNNV VLGGEEYYEE EEVVFANRVN

             AAAYVVVFFD DSLDDAAASS SSSSGGGGGG AAAMAAYYQE ICCDDVGEEE GGGA-RRHEP
             AAAYVVVFFE ESLEEPPNLL LLLRDDDSSS EVVRVVLLGR ISSGGIPEEE KKKTIRRMEE
             AAAYVVVFFE ESLEEPPNLL LLLRDDDSSS EVVRVVLLGR ISSGGIPEEE KKKTIRRMEE

             PLLWRDRTTL LVPLSSNNLL LAMLSSGESS DCCLLLGWHR RPSAAWWWWE EEAAAADDGG
             EKKWRVLEEF FVKLNNYYVV VAINNNYSII PFFLLLAWNL LPTSSWWWWR RR--------
             EKKWRVLEEF FVKLNNYYVV VAINNNYSII PFFLLLAWNL LPTSSWWWWR RR--------

             GDNNNNSSSV GSGSNNNGSS NGKSSSSGGG DGGGSSSVL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --TFPFFQ-- -------WDA ASLDDAAGFL
t5g66770.1   CTTSNLMMAQ VVQQQKQQQE EEEQHHHHHH QDIFGIINSL LLLLNPPWNS SFLSSGGSAF
t5g66770.2   CTTSNLMMAQ VVKKKKQQQE EEEQHHHHHH QDIFGIINSL LLLLNPPWNS SFLSSGGSAF

             PAAA------ ---------- -APDDGY--- ---------- ---------- ----------
             PFFQTGDDSD GGPPPPNNLH HATTTGFRRR LLSFFGGGEF SSEEWMMTTT TLIIISSGGG
             PFFQTGDDSD GGPPPPNNLH HATTTGFRRR LLSFFGGGEF SSEEWMMTTT TLIIISSGGG

             ---------- ---DPPPA-- --GGAADDD- ---------- -----VDDAL PPEFAAAFFP
             GDSSSVADPC DTTDPPPDYV VVGGPPDDDD DDYPSRLSVV PDDLNIDDTP LLPPPTLWWP
             GDSSSVADPC DTTDPPPDYV VVGGPPDDDD DDYPSRLSVV PDDLNIDDTP LLPPPTLWWP

             PPPPCAAAA- AAMMRREEEE EEIII----L HHLLMCGGEA DSSAAAAQQH HAAAAVSAAA
             PPPPSSPPPH SSPPKKEEDP EESSSFLEPL KKAAYC--SD DSSKKKKTTR RESSELGDDP
             PPPPSSPPPH SSPPKKEEDP EESSSFLEPL KKAAYC--SD DSSKKKKTTR RESSELGDDP

             GGIIRRRVVA AVHHHFLLLL FPPVPPTTAF -YYYHHHFEA CYYLLLKAAA HFFFTQQQQA
             ----RRRVVA AFYYYFLLLL SPPASSSSDL LYYYKKKLDA CYYSSSKAAA HLLLTQQQQA
             ----RRRVVA AFYYYFLLLL SPPASSSSDL LYYYKKKLDA CYYSSSKAAA HLLLTQQQQA

             AIFFHHHGGC DHHVDFFFSM GLWPAAAALL IIILLRPPPG PRTTGGGPPP PRDD---LLG
             AITTEEEKKS NKHIDFFFGV GIWPAAAALL LLLLTRTTTS PRSSPPPPPL LSPPSSSLLG
             AITTEEEKKS NKHIDFFFGV GIWPAAAALL LLLLTRTTTS PRSSPPPPPL LSPPSSSLLG

             RRLAAALLAA ARVRFSFFFF VNNNDDPPMM QAAAPEAAAV AAFFFFNVLQ LRLLGGPADQ
             RRLRRRFFAA AKLNFDFFFF I---HHGGSS RDDDPEVVVL AAVVVVNMLQ LKLL----DE
             RRLRRRFFAA AKLNFDFFFF I---HHGGSS RDDDPEVVVL AAVVVVNMLQ LKLL----DE

             QQQA---AAA LLDDCVVSSR KKIIIFTIEQ EEADHKTTTF DDTEALFYSS SSAAVFDDSL
             EEETTTITTT LLRRLAASSN RRVVVVTGEY EEVSLRVVVF NNKNALQYSS SSAAVFEESL
             EEETTTITTT LLRRLAASSN RRVVVVTGEY EEVSLRVVVF NNKNALQYSS SSAAVFEESL

             LDAAASSGGG NNAAMMMAAA AEEAAYYQRR RIICIVGGGA A-ERHHRWWR RRRLLTAAGG
             LEPNNRRSEE RRVVRRRVVV VEEEELLGRR RIISLIPPPT TIERMMQWWR RLLMMEAAGG
             LEPNNRRSEE RRVVRRRVVV VEEEELLGRR RIISLIPPPT TIERMMQWWR RLLMMEAAGG

             LSSSVPLGGS SAAAARMLLG SEE-HSSEEE EAGGCCCLGG WWHHRLLLLS ASSAAAWAAA
             FEEEVKLSSN NAAAAKILLW NSSLSIIESS SKGGFFFLAA WWNNLLLLLT LSSSSSW---
             FEEEVKLSSN NAAAAKILLW NSSLSIIESS SKGGFFFLAA WWNNLLLLLT LSSSSSW---

             AAGDGGGGDN NNNNNVSSGG SDSNNSSGKS GGRDGSVCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --MMTFFPPF F--------- WMDPAAASSG
t5g66770.1   MYMTDDMIAQ QIIIIKKKQQ QQQEQQQQHH HDHHIFFGGI ILLSLLNNPP W-NTSLLGGG
t5g66770.2   MYMTDDMIAQ QIIIIKKKQQ QQQEQQQQHH HDHHIFFGGI ILLSLLNNPP W-NTSLLGGG

             DDDAAGPAAA ---------- ---------- APDGGGG--- ---------- ----------
             SSSGGSPFQQ GGGGGGGNDD PGFPFPLHHH ATTGGGGRLF GGGTTGGGGG FFEWWMMMTT
             SSSGGSPFQQ GGGGGGGNDD PGFPFPLHHH ATTGGGGRLF GGGTTGGGGG FFEWWMMMTT

             ---------- -------YPA -----G---- ---------- -------VAA LPPPEFAAAA
             TTLIGGVVAD PPCCDTWHPD VVVVIGDDYP SSRLLLSVQQ QPPDNRRITS PLLLPPTTTL
             TTLIGGVVAD PPCCDTWHPD VVVVIGDDYP SSRLLLSVQQ QPPDNRRITS PLLLPPTTTL

             AAPPAPAAAA AAAA--MREE EEEEEVV--- ---------- LCCGGGIIEG DDHHHALLSA
             LLPPSPSSSI PPPPHHPTEE DDPPETTDDD DDFDLLEEPP ICC---IISS DDPPPNEESK
             LLPPSPSSSI PPPPHHPTEE DDPPETTDDD DDFDLLEEPP ICC---IISS DDPPPNEESK

             LLADDSSHHA AAAVSSAAAA ASGRRAAAHR PSPPTTTTDE AFLL--FYYE EEACCKKFAF
             LLLQQIIRRS SEELGGDDDP PTERRAAAYN PSPSSSSSST DLIISSLNND DDACCKKFAL
             LLLQQIIRRS SEELGGDDDP PTERRAAAYN PSPSSSSSST DLIISSLNND DDACCKKFAL

             FTNQLLEAFF FCDVVVHHVV VIFMQGLLLP PALQALAG-- -LRGIIGPPP SSPTTR----
             LTNQLLEATT TSNIIIHHII IVFVQGIIIP PALQALAGQQ QIRGIIPAPP SSLGGSPPPP
             LTNQLLEATT TSNIIIHHII IVFVQGIIIP PALQALAGQQ QIRGIIPAPP SSLGGSPPPP

             LRDDDVVGRL DLSSSVVVVV RFFFSSSRRR GGANNNNDDD VPPWWMLQAA GGEAAAAFSS
             LIAAATTGRL DFVVVLLLLL NFFFDDDIII PPT----HHH LGGSSSFRDD DDEVVAAVFF
             LIAAATTGRL DFVVVLLLLL NFFFDDDIII PPT----HHH LGGSSSFRDD DDEVVAAVFF

             VVVLLHRRLD PPPAQAPPDD VVLLDDCVRR PPVVVVEDHN TDDRTTTAAF FAADDLAASA
             MMMLLYKKL- ----ETPPDD AALLRRLANN PPLLLLESLN VNNRKKKAAQ QAAEELNGRE
             MMMLLYKKL- ----ETPPDD AALLRRLANN PPLLLLESLN VNNRKKKAAQ QAAEELNGRE

             GGNAAEELQQ QEEICCIVCC GGGGGAA-RR ERHHHEPPLW RRDDRRRLTR GLLAVPPPGG
             EERVVEEFGG GRRISSLIGG PPPPKGGIHR ERMMMEEEKW RRVVLLLMEN GFFSVKKKSS
             EERVVEEFGG GRRISSLIGG PPPPKGGIHR ERMMMEEEKW RRVVLLLMEN GFFSVKKKSS

             SLLRRAARRR MMLVVGFGEE ---VEEEEEL TTLLLGHPPS AAAEEAAADG GGGGGGGGDN
             NVVSSAAKKK IILLLWYYSS YYYVEESSSI SSLLLANPPT LLLRR----- ----------
             NVVSSAAKKK IILLLWYYSS YYYVEESSSI SSLLLANPPT LLLRR----- ----------

             NNNSSNNNSN NNVGSSSGSN SSSSSNGSSS GARDDSSCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------M DTTPQQ---- ---PPMDPAL
t5g66770.1   MAAAAMMCCT SGGNNNLLAI IAAAQVVKKK QQQQQQQQQH QIIGNNSSLN NNNPP-NTLL
t5g66770.2   MAAAAMMCCT SGGNNNLLAI IAAAQVVKKK QQQQQQQQQH QIIGNNSSLN NNNPP-NTLL

             LDAALLPPAA AA-------- ---------- AAADDDVVGG Y--------- ----------
             LSGGFFPPFQ QQTTTGGSSN DGFNDHHHHH AAATTTGGGG FRSFGTGGGG GGEEDDDEWM
             LSGGFFPPFQ QQTTTGGSSN DGFNDHHHHH AAATTTGGGG FRSFGTGGGG GGEEDDDEWM

             ---------- ----YPP--- GD-------- ---------- -----DDAAP EEEFAAAPPP
             LISDSVAAAD GPDDHNNYVY GDPPFFFTTY YPPLVQPSDL LNNVVDDTSL PPPPPPTPPP
             LISDSVAAAD GPDDHNNYVY GDPPFFFTTY YPPLVQPSDL LNNVVDDTSL PPPPPPTPPP

             PAVVLL-AMM MREEEEAIII R--------- -LVLLSCGAA EEGDAASAAL AHAAAVSSAA
             PSLLTTHSPP PTEDPPNSSS EDDDFFFDDL LLLAADC-RR SSSDNASKKL LRSSSLGGDP
             PSLLTTHSPP PTEDPPNSSS EDDDFFFDDL LLLAADC-RR SSSDNASKKL LRSSSLGGDP

             GRVVAAVHHT TSRRRLLFPP PVPPTTTDAA EAFF--HHFE CPYLLKFAHH AQLGGCDHVV
             ERVVAAFYYT TSNNRLLSPP PASSSSSSSS TDLLLLKKLD CPYSSKFAHH AQLKKSNKII
             ERVVAAFYYT TSNNRLLSPP PASSSSSSSS TDLLLLKKLD CPYSSKFAHH AQLKKSNKII

             HIIFSQGLPP AIIAARGGPP FFLRIITIGP SSPPPPTTRR RDDE--LRVV GGRLLDDLLA
             HVVFGQGIPP ALLAARSGKP TTIRVVSIPA SSLLLLGGSS SPPEPSLITT GGRLLDDFFA
             HVVFGQGIPP ALLAARSGKP TTIRVVSIPA SSLLLLGGSS SPPEPSLITT GGRLLDDFFA

             RRRRRRRVVF FFVVANEERR PPMLLIIPGE AAAVFFFVLH HRGGDDADDD DQAP---IDD
             KDDDDDDLLF FFIIT-LLNN GGSFFVVPDE VVVLVVVMLY YK-----DDD DETPTIIVDD
             KDDDDDDLLF FFIIT-LLNN GGSFFVVPDE VVVLVVVMLY YK-----DDD DETPTIIVDD

             AALDCCVAAS VRRPKKIFTV QEDHHNNKTL DDDRFTTELF FYYSVVFSLL DAAASAAAGG
             TTLRLLAKKS LNNPRRVVTL YESLLNNRVA NNNRVKKNLQ QFYSVVFSLL EPNNLGGGDS
             TTLRLLAKKS LNNPRRVVTL YESLLNNRVA NNNRVKKNLQ QFYSVVFSLL EPNNLGGGDS

             GGAAGNNAAM AE--YLREEE IICEEAAARR RHEELLSSRR RRRDDDTTRR AGGLSVVPLG
             SSEEERRVVR VERRLFRRRR ILGEETGGHR RMEEKKEEQQ QRRVVVEENN AGGFEVVKLS
             SSEEERRVVR VERRLFRRRR ILGEETGGHR RMEEKKEEQQ QRRVVVEENN AGGFEVVKLS

             NRRQQRMLGL LFSGGEE--H HSEADDGCTL GGWHRFAAAW WEAAAGDDGG GNNNNNNSNS
             YSSQQKILWN NYNYYSSLLS SIEKPPGFSL AAWNLLLSSW WR-------- ----------
             YSSQQKILWN NYNYYSSLLS SIEKPPGFSL AAWNLLLSSW WR-------- ----------

             SNNNSSSSNN NNSSGGSGGS SGGAAGGSSS VVVVVCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- --------DF -------WWP MDDPAGLLFL PPAAAV----
t5g66770.1   MAAAAYDSLL MMAAVIKKQQ EEQQQQDDQI PPLSLLPWWP -NNTLGLLAF PPQQQVTGGG
t5g66770.2   MAAAAYDSLL MMAAVIKKQQ EEQQQQDDQI PPLSLLPWWP -NNTLGLLAF PPQQQVTGGG

             ---------- ----PPDDDG VGGG------ ---------- ---------- ----------
             GGNGFFFFNN DHHHTTTTTG GGGGRLLLLL SDFGGTFEDE WMMETLISGG GSVVADDDGP
             GGNGFFFFNN DHHHTTTTTG GGGGRLLLLL SDFGGTFEDE WMMETLISGG GSVVADDDGP

             ---YP----- --AD------ ---------- --VDAFFAAF FFPCCCAALL L--MMREEEE
             CDWHNYYYVV VVPDPPPFTY YSSRLQSSDD NVIDSPPTLW WWPSSSSITT THHPPKEDPP
             CDWHNYYYVV VVPDPPPFTY YSSRLQSSDD NVIDSPPTLW WWPSSSSITT THHPPKEDPP

             AGGGRR---- ------VHHM SAAGAAGGGD HALLAASAAA ADDDSHAAAL LLAAVVAAGG
             NDDDEEDDDD FFDEPPLKKY DAA-RDSSSD PNEEAASKKK LQQQIREEEV VVEELLDP--
             NDDDEEDDDD FFDEPPLKKY DAA-RDSSSD PNEEAASKKK LQQQIREEEV VVEELLDP--

             GGRRAFTTAL LLSRLLFPSP TTHHAFLLLY HHHHEECYLL LKFFFAAAHH HFFFFTAANQ
             -ERRAFTEAL LLSNLLSPSP SSEEDLIIIY KKTTDDCYSS SKFFFAAAHH HLLLLTAANQ
             -ERRAFTEAL LLSNLLSPSP SSEEDLIIIY KKTTDDCYSS SKFFFAAAHH HLLLLTAANQ

             AILLEEAFHH CCCHVVHHDF SMMMQGLQPL QQRPPP---L RITIIIGPPP SSPGGRDDEE
             AILLEEATEE SSSKIIHHDF GVVVQGIQPL QQRTPPQQQI RVSIIIPAAP SSLEESPPEE
             AILLEEATEE SSSKIIHHDF GVVVQGIQPL QQRTPPQQQI RVSIIIPAAP SSLEESPPEE

             ----DGRAAD DAASSVRRRS VNNSSLLDDV PPWWLIPPGG EEVVAAAAAF FQLLRRDDPP
             PPPSAGRRRD DAAVVLDNND I--PPIIHHL GGSSFVPPDD EELLAAAAAV VQLLKK----
             PPPSAGRRRD DAAVVLDNND I--PPIIHHL GGSSFVPPDD EELLAAAAAV VQLLKK----

             AADDQQA--I ILLCAASVRI FTVIEQQEAD DHNNGLLLRR TELFFYYYYS DSDASAASGG
             --DDEETTIV VLLLKKSLNV VTLGEYYEVS SLNNGAAARR KNLQQFFFYS ESENLGGRDS
             --DDEETTIV VLLLKKSLNV VTLGEYYEVS SLNNGAAARR KNLQQFFFYS ESENLGGRDS

             GGNMMEE-YL QRRREIDDII VVVVCAAA-R REERRRHPLL LLSRRRDDRR RRRAAGGLLS
             EERRREERLF GRRRRIGGLL IIIIGTTGIH REERRRMEKK KKEQQRVVLL LLNAAGGFFN
             EERRREERLF GRRRRIGGLL IIIIGTTGIH REERRRMEKK KKEQQRVVLL LLNAAGGFFN

             NALRQQAMLL LVVGGLFGEE EEG--HVVAG GGCCTLGWHR LFSWEEEEAA GGGGGNNNNN
             YAVSQQAILL LLLWWNYYSS SSNYYSVVKG GGFFSLAWNL LLTWRRRR-- ----------
             YAVSQQAILL LLLWWNYYSS SSNYYSVVKG GGFFSLAWNL LLTWRRRR-- ----------

             SSVGSSSSSS SSGSSDNNGG NNGGKSRGGG GSSSSSVVV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------DDDT TPQ--PDPAA ASSLLAPAAA
t5g66770.1   MAAYMTSSGN NNLLMMAAAQ QVQQQKQQQQ QQQQQHQQQI IGNLNPNTSS LFFLLGPFFF
t5g66770.2   MAAYMTSSGN NNLLMMAAAQ QVKKKKQQQQ QQQQQHQQQI IGNLNPNTSS LFFLLGPFFF

             AV-------- -------AAD DDDVVY---- ---------- ---------- ----------
             QVGGDDNGGF FPPFFPDAAT TTTGGFRLDG GGTGGGGEEE SSDDEWEETT LIIGGDVAAG
             QVGGDDNGGF FPPFFPDAAT TTTGGFRLDG GGTGGGGEEE SSDDEWEETT LIIGGDVAAG

             ------YPP- --GGA----- ---------- -----DDAAL AAFCCAADAA AAAAAVL-AE
             GDCTWWHPPV IYGGPPPFDY YPSRLLSSVQ QPDRVDDSSP PLWSSSSLSI IPPPPLTESE
             GDCTWWHPPV IYGGPPPFDY YPSRLLSSVQ QPDRVDDSSP PLWSSSSLSI IPPPPLTESE

             EEEAGGGII- ---------L LLLMMMMCCC CAGGAIIIAG LALAAHHLLA AVAAAAIIIG
             EEENDDDSSD DDFFDDDLEL AAIYYYYCCC CA--RIIIDS EKLLLRRVVS ELDDDP---E
             EEENDDDSSD DDFFDDDLEL AAIYYYYCCC CA--RIIIDS EKLLLRRVVS ELDDDP---E

             GRVVFTTTTT LSRRRLLFFP SPAAAAAPPP DDAEEHHHAL ----YFFECP YYKKFAAHHF
             ERVFFTEEEE LSNRRLLSSP SPTTTTTSSS SSSTTEEEDI LLLSYLLDCP YYKKFAAHHL
             ERVFFTEEEE LSNRRLLSSP SPTTTTTSSS SSSTTEEEDI LLLSYLLDCP YYKKFAAHHL

             AAQIILEAAA FHHCDHHHHV IIDDFFSMMQ GLQQAAAALI IIALLAGPRI GPTGGEE--L
             AAQIILEAAA TEESNKHHHI VVDDFFGVVQ GIQQAAAALL LLALLAGKRI PPGEEEEPSL
             AAQIILEAAA TEESNKHHHI VVDDFFGVVQ GIQQAAAALL LLALLAGKRI PPGEEEEPSL

             RVGRLLADAS VRSFRRRVVA AANSLLDEER RRPPLLLPPG GEAAAAFFNS VVLQQQQQLL
             ITGRLLRDAV LNDFIIIIIL TT-PIIHLLN NNGGFFFPPD DEVVAAVVNF MMLQQQQQLL
             ITGRLLRDAV LNDFIIIIIL TT-PIIHLLN NNGGFFFPPD DEVVAAVVNF MMLQQQQQLL

             HLLPADD-ID DDALDDCCCA AAAVRPPPPK KKKTVVVEEQ QEEAADDHHH NKKKFFLDDR
             YLL--DDIVD DDTLRRLLLK KKKLNPPPPR RRRTLLLEEY YEEVVSSLLL NRRRFFANNR
             YLL--DDIVD DDTLRRLLLK KKKLNPPPPR RRRTLLLEEY YEEVVSSLLL NRRRFFANNR

             FFELLYYYVD SSSLLLDASA GGGGGNNME- ---AYIDDDV VCGGEGA--R ERHHSRRWWR
             VVNLLFYYVE SSSLLLENLG DDDEERRRER RRRELIGGGI IGPPEKGIIR ERMMEQQWWR
             VVNLLFYYVE SSSLLLENLG DDDEERRRER RRRELIGGGI IGPPEKGIIR ERMMEQQWWR

             DRLTAGLLAV PPGGSALRAR RLFSG----- EAAGGCTLGW WWHHGGLSSA EAAAAGGGGD
             VLMEAGFFSV KKSSNAVSAK KNYNYLLLYY EKKGGFSLAW WWNNDDLTTS R---------
             VLMEAGFFSV KKSSNAVSAK KNYNYLLLYY EKKGGFSLAW WWNNDDLTTS R---------

             DDDNNNNSSS VVSSSDSNNS SGSGGKSSSS AARDDGVLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------F QQ------WM DPAASSGDAF FPPPAAAAV-
t5g66770.1   MAAYYTTDSS GGNNLLAQVV KQQQQQQDDF NNPSSNNPW- NTSLGFGSGA APPDFFQQVT
t5g66770.2   MAAYYTTDSS GGNNLLAQVV KQQQQQQDDF NNPSSNNPW- NTSLGFGSGA APPPFFQQVT

             ---------- ----DGGYY- ---------- ---------- ---------- -----YYDDA
             GGSSSSNDGP NLHHTGGFFD FGGGGTGEEE SSDEWWMETT ISGGGDSSVV GCCWWHHDDD
             GGSSSSNDGP NLHHTGGFFD FGGGGTGEEE SSDEWWMETT ISGGGDSSVV GCCWWHHDDD

             ---GA----- ---------- ------VVDD DAAAPEEEFF AAAPPPAPAA AVLLL--EEE
             YIYGPFFFDT SSLSSSSVQP PPSLNRIIDD DTSSLPPPPP PPLPPPSPSI PLTTTHEEDD
             YIYGPFFFDT SSLSSSSVQP PPSLNRIIDD DTSSLPPPPP PPLPPPSPSI PLTTTHEEDD

             EEGIR----- ---------- -LVHLLLMMS AAAAGGAAGA AAADDSHAAA AAAAAAAAAS
             PEDSEDDDDF DLLLEEPPPP PLLKAAIYYD AAAA--DDSA KLLQQIRESS EEDDDDDPPT
             PEDSEDDDDF DLLLEEPPPP PLLKAAIYYD AAAA--DDSA KLLQQIRESS EEDDDDDPPT

             GGGIIGRRVA VFTASSSRRL FFP--SSVAA AAPPPPPPDA EEEEHHF--- -HHYYACPPK
             -----ERRVA FFTASSSNRL SSPNNSSATT TTSSSSSSSS TTTTEELLLS SKTNNACPPK
             -----ERRVA FFTASSSNRL SSPNNSSATT TTSSSSSSSS TTTTEELLLS SKTNNACPPK

             KKFFAAANQQ QQAIIILLEA AAGCHHVVII DDDLLMQGQP AAIIQQALLP PF---RIITG
             KKFFAAANQQ QQAIIILLEA AAKSKHIIVV DDDIIVQGQP AALLQQATTT PTQQQRVVSG
             KKFFAAANQQ QQAIIILLEA AAKSKHIIVV DDDIIVQGQP AALLQQATTT PTQQQRVVSG

             GIGGPPPPPG GRDE---DDR LLRRRRRFFS SFGGAAASLD ERRPPWWMML IIIIAGAVNN
             GIPPAAAPPE ESPEPPPAAR FFKDDDDFFD DFPPLTTPIH LNNGGSSSSF VVVVDDVLNN
             GIPPAAAPPE ESPEPPPAAR FFKDDDDFFD DFPPLTTPIH LNNGGSSSSF VVVVDDVLNN

             SVLLQQHGGD PPAAQAP--I IVLCCVASSS SVVRRRPIFF TTVVVIADKK KTGFFRFFTE
             FMLLQQY--- ----ETPTTV VALLLAKSSS SLLNNNPVVV TTLLLGVSRR RVGFFRVVKN
             FMLLQQY--- ----ETPTTV VALLLAKSSS SLLNNNPVVV TTLLLGVSRR RVGFFRVVKN

             LYYSSSAVFF DDDAAAAAGG ANNAAMMAE- AAAYYLLQQR RRECDIIVVV CGEEGA-RRE
             LFYSSSAVFF EEENNNNGDD ERRVVRRVER EEELLFFGGR RRRSGLLIII GPEEKGIHRE
             LFYSSSAVFF EEENNNNGDD ERRVVRRVER EEELLFFGGR RRRSGLLIII GPEEKGIHRE

             EESSSRRRLT RALLAVVPGS SNNARRRRRL GGSVEEEADD GCLTLLGGGH GRRPFSSAAA
             EEEEEQQLME NAFFSVVKSN NYYASKKKKN YNIVSSSKPP GFISLLAAAN DLLPLTTLLL
             EEEEEQQLME NAFFSVVKSN NYYASKKKKN YNIVSSSKPP GFISLLAAAN DLLPLTTLLL

             WEEEAGDNNN NSSNSVVSGS SSGGSDDSSS NSGSDSSSC
             WRRR------ ---------- ---------- ---------
             WRRR------ ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ----MMFFFF Q--------W PDPAASSGGL DDDDAAAGGG
t5g66770.1   AMSGNLLMMM AAQKKKEEQQ QQQQHHFIII NPLLLLNNPW PNTLLGGGGL SSSSGGGSSS
t5g66770.2   AMSGNLLMMM AAQKKKEEQQ QQQQHHFIII NPLLLLNNPW PNTLLGGGGL SSSSGGGSSS

             FPPPAAAA-- ---------- ---AAAPPPP DGVVY----- ---------- ----------
             APDPFFQQTT GDDDGGFFNH HHHAAATTTT TGGGFRLLLS DDFGGGGGEE SSSDWWWMTL
             APPPFFQQTT GDDDGGFFNH HHHAAATTTT TGGGFRLLLS DDFGGGGGEE SSSDWWWMTL

             ---------- -------DDP PPA---GGGA DD-------- --------DA AAAAAAAALL
             ISGGGDVDDP PPCDTTWDDN NPDVVIGGGP DDPFFYRLLP PSSDLNVVDT TTTSSSSSPP
             ISGGGDVDDP PPCDTTWDDN NPDVVIGGGP DDPFFYRLLP PSSDLNVVDT TTTSSSSSPP

             EFACCAPAAA AAVL--AARE EEAGGR---- ----LLHLLS SCCAIIIGGH ASAAALLDSH
             PPTSSSPSII PPLTEESSKP EENDDEDFFF DLPPLLKAID DCCAIIISSP ASKKKLLQIR
             PPTSSSPSII PPLTEESSKP EENDDEDFFF DLPPLLKAID DCCAIIISSP ASKKKLLQIR

             AAAAAAASGI RVVVAAHTTS RRLLFP--VV PTTTTDDDDA EEAFFLLL-- -YYHHHHHHF
             EESSSPPT-- RVVVAAYTES RRLLSPNNAA SSSSSSSSSS TTDLLIIILS SYYKKTTTTL
             EESSSPPT-- RVVVAAYTES RRLLSPNNAA SSSSSSSSSS TTDLLIIILS SYYKKTTTTL

             FAAAYYLFAH TTNNAAILLL AFHCCDVVVI IDDFSSLLMM GGQWLIQAGG GPPPPIIITG
             LAAAYYSFAH TTNNAAILLL ATESSNIIIV VDDFGGIIVV GGQWLLQASS GPPPPVVVSG
             LAAAYYSFAH TTNNAAILLL ATESSNIIIV VDDFGGIIVV GGQWLLQASS GPPPPVVVSG

             GIPPPSSSSP PTRRD--LLR DDVLLLLLAS VRFFSSFRRV AANSLDDVRP PPWWMLLQQQ
             GIAPPSSSSL LGSSPPSLLI AATNNLLLRV LNFFDDFIII LT-PIHHLNG GGSSSFFRRR
             GIAPPSSSSL LGSSPPSLLI AATNNLLLRV LNFFDDFIII LT-PIHHLNG GGSSSFFRRR

             QIIPPPGGEA AANNVVVVLQ HHRGDQAA-I DAAAVLLVAR RPFTVEEEEE EAADHNKKKT
             RVVPPPDDEV AANNMMMMLQ YYK--ETTIV DTTTALLAKN NPVTLEEEEE EVVSLNRRRV
             RVVPPPDDEV AANNMMMMLQ YYK--ETTIV DTTTALLAKN NPVTLEEEEE EVVSLNRRRV

             TGFFFLRFFF FTTEAALLYS SAAVDDLLDD DAASAAGAAG NNMAEALLLE ICIGGEGAAA
             VGFFFARVVV VKKNAALLFS SAAVEELLEE ENNLGGSEEE RRRVEEFFFR ISLPPEKTTG
             VGFFFARVVV VKKNAALLFS SAAVEELLEE ENNLGGSEEE RRRVEEFFFR ISLPPEKTTG

             RHHEESSSRR RRALLAAAPL GSSNNLRRAV GE---HSVEE EAAADDGGGC CCLLTWWHPP
             RMMEEEEERR RNAFFSSSKL SNNYYVSSAL YSYYYSIVES SKKKPPGGGF FFIISWWNPP
             RMMEEEEERR RNAFFSSSKL SNNYYVSSAL YSYYYSIVES SKKKPPGGGF FFIISWWNPP

             PASWAADGGG DNNSSNNVSS SGSNNSGGSD GSSVCCCLL
             PLSW------ ---------- ---------- ---------
             PLSW------ ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------MMTF PPFQ------ ---WMDDASS
t5g66770.1   AAYYMMTSNL MMMAIAQQVV IIKQQQQEQQ QQQQHDHHIF GGINLLLSSN NNPW-NNSGF
t5g66770.2   AAYYMMTSNL MMMAIAQQVV IIKQQQQEQQ QQQQHDHHIF GGINLLLSSN NNPW-NNSGF

             SLLLAPPPPA V--------- ----APGG-- ---------- ---------- ----------
             FLLLGPDPPF VGGGFPPPPP DHHHATGGRL SDDGGGEFES SDEMETSSSG GGGGDDDDSA
             FLLLGPPPPF VGGGFPPPPP DHHHATGGRL SDDGGGEFES SDEMETSSSG GGGGDDDDSA

             -------YYP A--------G GAADDD---- ---------- ------VAAA LLEFAAPADA
             DDGGDDTHHN DYVVIIIYYG GPPDDDFDTY SRLQQQPPSL NNRRVVITTS PPPPTLPSLS
             DDGGDDTHHN DYVVIIIYYG GPPDDDFDTY SRLQQQPPSL NNRRVVITTS PPPPTLPSLS

             AAV----AAA MRRRREEEAA R-------LV HHLLLLMAGE GGGLSSSSAA ALLAAHHAAA
             IPLHHEESSS PTTTKPPENN EDDFFDLPLL KKAAAIYA-S SSSESSSSKK KLLLLRRESS
             IPLHHEESSS PTTTKPPENN EDDFFDLPLL KKAAAIYA-S SSSESSSSKK KLLLLRRESS

             AAVAASGIII AAVHHFFTTT TAASSRRRFP --VAPDAHHH HFL-HHHFFF YEEACPYLLK
             SELDDT---- AAFYYFFTTT EAASSNRRSP NNATSSSEEE ELILKTTLLL NDDACPYSSK
             SELDDT---- AAFYYFFTTT EAASSNRRSP NNATSSSEEE ELILKTTLLL NDDACPYSSK

             KAHHHFANNQ AEAAAHGGDD DHHVHIIIDD DFFSSQGGQP PAALLIIIII QARRPGGGGP
             KAHHHLANNQ AEAAAEKKNN NKKIHVVVDD DFFGGQGGQP PAALLLLLLL QARRTSSGGK
             KAHHHLANNQ AEAAAEKKNN NKKIHVVVDD DFFGGQGGQP PAALLLLLLL QARRTSSGGK

             PLRIPPPPSS PPTGGRRRE- -VGLLRLSSR SGGGVNNSLE EEWMLQQQAG GGGVVANSSV
             KIRIAPPPSS LLGEESSSEP STGNNRLVVN DPPPI--PIL LLSSFRRRDD DDDLLANFFM
             KIRIAPPPSS LLGEESSSEP STGNNRLVVN DPPPI--PIL LLSSFRRRDD DDDLLANFFM

             QRLLLLLDAD QPPIIDDVLL DCVVAAASVR PKKIFAADNT FFFLLDDFEE AALYYSAFSL
             QKLLLLL--D EPPVVDDALL RLAAKKKSLN PRRVVVVSNV FFFAANNVNN AALFYSAFSL
             QKLLLLL--D EPPVVDDALL RLAAKKKSLN PRRVVVVSNV FFFAANNVNN AALFYSAFSL

             SSSSSGGGNA EE-YYYLQQR IIICCDIIVG GA-RRRRHHE PSRRRWWRRR LTRAAAVLLG
             LLLRRDSSRV EERLLLFGGR IIISSGLLIP KGIHHRRMME EEQQQWWRLL MENAASVLLS
             LLLRRDSSRV EERLLLFGGR IIISSGLLIP KGIHHRRMME EEQQQWWRLL MENAASVLLS

             GGGNQQALLL LLLSSGEGGG G--HSVEEAA DCCTTGGWGG GPLLSWWWWE EAAADDDGGG
             SSSYQQALLL LNNNNYSNNN NLYSIVESKK PFFSSAAWDD DPLLTWWWWR R---------
             SSSYQQALLL LNNNNYSNNN NLYSIVESKK PFFSSAAWDD DPLLTWWWWR R---------

             DDDNNNNNNS SNVSGGDDSS NGSSGGSSSS GRRDGGVVV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -DDDTTFFPP PF------MP PPSGLDAFFL
t5g66770.1   AYMDSGNMAQ QQKKQQQQEQ QQQQQQQQHH QQQQIIFFGG GIPLSLLP-T TTGGLSGAAF
t5g66770.2   AYMDSGNMAQ QQKKQQQQEQ QQQQQQQQHH QQQQIIFFGG GIPLSLLP-T TTGGLSGAAF

             PPPAV----- --------PD GGVGY----- ---------- ---------- ----------
             DPPQVTGGGG GNGFNNLHTT GGGGFRLSFF FGGGEFFESS SDEWMEELLS GDSSVAAGCT
             PPPQVTGGGG GNGFNNLHTT GGGGFRLSFF FGGGEFFESS SDEWMEELLS GDSSVAAGCT

             ---YYYYDDP PPA------- GGA------- -----VVDDA ALLPPEEFAA AAFFPPPCAP
             TWWHHHHDDN NPDVVVVIIY GGPPFPSLSV VSDNRIIDDT SPPLLPPPPP TLWWPPPSSP
             TWWHHHHDDN NPDVVVVIIY GGPPFPSLSV VSDNRIIDDT SPPLLPPPPP TLWWPPPSSP

             PPDAAAAREE EEVGIIRRR- -------VHM SAAAAAGAEA AASAAQQAAD DSHHAAAAAV
             PPLIIPPKED DETDSSEEED DDFDLPPLKY DAAAAA-RSD NASKKTTLLQ QIRREESSEL
             PPLIIPPKED DETDSSEEED DDFDLPPLKY DAAAAA-RSD NASKKTTLLQ QIRREESSEL

             VVAASSIRRR VAVVVHHTAA SRRLPP-SAP TAAHHAFL-- HHHYYYPLLK KFAFTANEFH
             LLDPTT-RRR VAFFFYYTAA SNRLPPNSTS SSSEEDLILL KTTNNNPSSK KFALTANETE
             LLDPTT-RRR VAFFFYYTAA SNRLPPNSTS SSSEEDLILL KTTNNNPSSK KFALTANETE

             HHHHHHHIID DDSMQQQGLQ QQQWPAAAQA LRPPGGPPFF ITGGIIGPST RRDE---LDD
             EEKKKKKVVD DDGVQQQGIQ QQQWPAAAQA TRTTSGPPTT VSGGIIPASG SSPEPPPLAA
             EEKKKKKVVD DDGVQQQGIQ QQQWPAAAQA TRTTSGPPTT VSGGIIPASG SSPEPPPLAA

             VGGGRADDLL ARVRRRVVVV SSFRGVNNLL EVRRPPWWLL QIIIAGGEEA AAVFNSSLLQ
             TGGGRRDDFF AKLDDDLLLL DDFIPI--II LLNNGGSSFF RVVVDDDEEV VVLVNFFLLQ
             TGGGRRDDFF AKLDDDLLLL DDFIPI--II LLNNGGSSFF RVVVDDDEEV VVLVNFFLLQ

             QLLLRGGGPP AADAP----I IIIDVVVLVV SSSSSVPKII TTVIQQQQQE HKKKFLDRRF
             QLLLK----- --DTPTTTIV VVVDAAALAA SSSSSLPRVV TTLGYYYYYE LRRRFANRRQ
             QLLLK----- --DTPTTTIV VVVDAAALAA SSSSSLPRVV TTLGYYYYYE LRRRFANRRQ

             SAAFDDLLLL DDASGAAAGG GNNAAAE-YL QQEEIICEER RHLLSRWWWR RTRAALSAVP
             SAAFEELLLL EENRSEEEEE ERRVVVERLF GGRRIIGEER RMKKEQWWWR LENAAFESVK
             SAAFEELLLL EENRSEEEEE ERRVVVERLF GGRRIIGEER RMKKEQWWWR LENAAFESVK

             LGSSNLRRRR LFFFGE-SVV VEEEADCLLG WGGRRRPPLF AAAAWAAADD GGGGGGGGDN
             LSNNYVSKKK NYYYYSLIVV VEESKPFILA WDDLLLPPLL LLLSW----- ----------
             LSNNYVSKKK NYYYYSLIVV VEESKPFILA WDDLLLPPLL LLLSW----- ----------

             NNNNSSNVVV GSSGGGSSDG SSNGSSGGAA RRDSSSCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----MMDTFF FFQ---PPDD DDPPAAAAGG
t5g66770.1   MDGNLAAQVV VVIIIKKQQQ QQQQQQQQQH QQQQHHQIFF FINPPPPPNN NNTTSLLLGG
t5g66770.2   MDGNLAAQVV VVIIIKKQQQ QQQQQQQQQH QQQQHHQIFF FINPPPPPNN NNTTSLLLGG

             LAGGP----- ---------- --------AP PDDGVGY--- ---------- ----------
             LGSSPGGGDS SDPGGPPFFF NNLDDHHHAT TTTGGGFRRS DGGGGGTGGG GEEDDDDEWM
             LGSSPGGGDS SDPGGPPFFF NNLDDHHHAT TTTGGGFRRS DGGGGGTGGG GEEDDDDEWM

             ---------- -DPP-ADD-- ---------- ------VVAA LLFFAAAAFP CAAADAAAAA
             LGGDGDDTTW WDNPVPDDPP FFSRRRRVQP SSDDRVIITS PPPPPPTTWP SSSSLSSSSI
             LGGDGDDTTW WDNPVPDDPP FFSRRRRVQP SSDDRVIITS PPPPPPTTWP SSSSLSSSSI

             VVLL-AMRRR EEAGGII--- -------VLL LCGGAAIEEA AAGDDAAAAA ASAAQADDSS
             LLTTESPTKK DPNDDSSDDD FDDDEEPLAA IC--RRISSD DDSDDNNNAA ASKKTLQQII
             LLTTESPTKK DPNDDSSDDD FDDDEEPLAA IC--RRISSD DDSDDNNNAA ASKKTLQQII

             HHAAAAAASS SGIIIGGGGR VVAAAFFTSR LF-SAPPPPP DDAAAAFFFL L--HHHHYEE
             RRSSEDDPTT T----EEEER VVAAAFFTSR LSNSTSSSSS SSSDDDLLLI ILSKTTTNDD
             RRSSEDDPTT T----EEEER VVAAAFFTSR LSNSTSSSSS SSSDDDLLLI ILSKTTTNDD

             AAPPPLLKAA HHFFNNNQAI ILAFFHGGDV HVIFSLMQQQ GGWPAALLLQ AALAAPGF-L
             AAPPPSSKAA HHLLNNNQAI ILATTEKKNI HIVFGIVQQQ GGWPAALLLQ AALAATSTQI
             AAPPPSSKAA HHLLNNNQAI ILATTEKKNI HIVFGIVQQQ GGWPAALLLQ AALAATSTQI

             RIITIGGPPS SPPRD---LL RDDLRLAADL LLAARSSSVV VRFSFRVVAA NSLEVPWMML
             RVVSIPPAPS SLLSPPSSLL IAANRLRRDF FFAAKVVVLL LNFDFIIILT -PILLGSSSF
             RVVSIPPAPS SLLSPPSSLL IAANRLRRDF FFAAKVVVLL LNFDFIIILT -PILLGSSSF

             QQAAPPPPGA AFSSVLHLLL GPDDAAPPVL LDCCVSSSVV RPPFEQEAAD DHHHNNKKTG
             RRDDPPPPDV AVFFMLYLLL --DDTTPPAL LRLLASSSLL NPPVEYEVVS SLLLNNRRVG
             RRDDPPPPDV AVFFMLYLLL --DDTTPPAL LRLLASSSLL NPPVEYEVVS SLLLNNRRVG

             FFLRRTAAFY FLAAASGGGA AAGNAMAYLQ RREEEIDIGG EGAA--EEER HHEELSSSSW
             FFARRKAAQF FLPGGRSSSE EEERVRVLFG RRRRRIGLPP EKTTIIEEER MMEEKEEEEW
             FFARRKAAQF FLPGGRSSSE EEERVRVLFG RRRRRIGLPP EKTTIIEEER MMEEKEEEEW

             WWRTRAALPP LLLAAARRAR MMVVVSEG-- HHHEEEEAGC CLLLTLLLHR RLLLLFFSSA
             WWLENAAFKK LLLAAASSAK IILLLNSNLL SSSEESSKGF FIIISLLLNL LLLLLLLTTL
             WWLENAAFKK LLLAAASSAK IILLLNSNLL SSSEESSKGF FIIISLLLNL LLLLLLLTTL

             SAAAAAGDNN NNNSSVSSSS NNSSSSNNGK KGRRGSSSL
             SSSSS----- ---------- ---------- ---------
             SSSSS----- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- --MD------ --WPPMDPPA
t5g66770.1   MAAAYMCCTD SGLMMMIIIA QQQQQVIIQQ QQQQHHQQQQ QDHQLSSNNN PPWPP-NTTS
t5g66770.2   MAAAYMCCTD SGLMMMIIIA QQQQQVIIQQ QQQQHHQQQQ QDHQLSSNNN PPWPP-NTTS

             AASSGDGFFF PPPAA----- ---------- --APDDDVVG ---------- ----------
             SLFFGSSAAA PDDQQTTGGG NDPGGFPLDD DHATTTTGGG RRFGGGTGGG GEDDEWMMMM
             SLFFGSSAAA PPPQQTTGGG NDPGGFPLDD DHATTTTGGG RRFGGGTGGG GEDDEWMMMM

             ---------- ---------- ------YDPP P---GAA--- ---------- -----VAAAA
             ETSGGGGGDS SSVADDPPPD DTTTWWHDNP PYIIGPPPPF DTTTTPSQQQ PSDNRITSSS
             ETSGGGGGDS SSVADDPPPD DTTTWWHDNP PYIIGPPPPF DTTTTPSQQQ PSDNRITSSS

             PEFFPDAAAA AAAAVL--AM MRREEEEEGG IIR------- ----LLVVHL SCGGGAAAII
             LPPPPLSIPP PPPPLTEESP PTTDDPPPDD SSEDDFDEEE PPPPLLLLKA DC---RRRII
             LPPPPLSIPP PPPPLTEESP PTTDDPPPDD SSEDDFDEEE PPPPLLLLKA DC---RRRII

             HHHHALASSA QLLASHHALA VASIVFFTTA LLSF--SSVA PTTAAEHAAL ----YYHHYY
             PPPPNEASSK TLLLIRRSVE LDT-VFFTEA LLSSNNSSAT SSSSSTEDDI LLLSYYKTNN
             PPPPNEASSK TLLLIRRSVE LDT-VFFTEA LLSSNNSSAT SSSSSTEDDI LLLSYYKTNN

             EAAPPLKKKK AATTANQLLE HHCDDDHHVV ISSLQWPAII QAALLRGPPF LTGGGIGGGG
             DAAPPSKKKK AATTANQLLE EESNNNKKII VGGIQWPALL QAALLRSKKT ISGGGIPPPP
             DAAPPSKKKK AATTANQLLE EESNNNKKII VGGIQWPALL QAALLRSKKT ISGGGIPPPP

             PSSPTTRDDE E-RDDLLRLA DRVRRRRFFF RGGVVAAADE ERPWMMLQQQ EEEVVAAFFS
             PSSLGGSPPE EPIAANNRLR DKLDDDNFFF IPPIILLTHL LNGSSSFRRR EEELLAAVVF
             PSSLGGSPPE EPIAANNRLR DKLDDDNFFF IPPIILLTHL LNGSSSFRRR EEELLAAVVF

             LHLLLGDDAA AADQQQQAAA P---IVDVAV VRPPKIIIFF TTTVIIIIIE EEADDHTTGF
             LYLLL----- --DEEEETTT PTIIVARAKL LNPPRVVVVV TTTLGGGGGE EEVSSLVVGF
             LYLLL----- --DEEEETTT PTIIVARAKL LNPPRVVVVV TTTLGGGGGE EEVSSLVVGF

             DDDRFEALLF YYADSLLAAS SAGAAGNNNE -AQQQRRECC CCVVVCCRRH EPSSRWDRRL
             NNNRVNALLQ FFAESLLNNL LGSEEERRRE REGGGRRRSS SSIIIGGRRM EEEEQWVLLM
             NNNRVNALLQ FFAESLLNNL LGSEEERRRE REGGGRRRSS SSIIIGGRRM EEEEQWVLLM

             LGLLLPPSSN QARVGGLLSE EGGHVEEEAA ADGGCLGGGH GGRRSSSAAA SAAWAADGGD
             MGFFFKKNNY QAKLWWNNNS SNNSVEESKK KPGGFIAAAN DDLLTTTLLL SSSW------
             MGFFFKKNNY QAKLWWNNNS SNNSVEESKK KPGGFIAAAN DDLLTTTLLL SSSW------

             NNNNNNNNNS SSSDDSSNNN SSGGSNGKGG ARRDSVCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------M TPP-----MD PAAASSSDAL
t5g66770.1   MMAMCCCTDN NLLLAIQKQQ QKKQEEQQQQ QQQQQQQQQH IGGPPLNN-N TSLLGGFSGF
t5g66770.2   MMAMCCCTDN NLLLAIQKKK KKKQEEQQQQ QQQQQQQQQH IGGPPLNN-N TSLLGGFSGF

             LLPPAAVV-- ---------P PGY------- ---------- ---------- ----------
             FFPPFFVVGG GDPFFFLLDT TGFRRLLLLS SSSDFGGGGG GEEFFFSSDD MLISGDDDDV
             FFPPFFVVGG GDPFFFLLDT TGFRRLLLLS SSSDFGGGGG GEEFFFSSDD MLISGDDDDV

             ---------- --YYDP---- AA-------- --VVDAAALP FAAAAAFFPC PPDDAAAAAA
             VVDDDGGPDD DDHHDPVVIY PPTYYRLPDD DLIIDSSSPL PTLLLLWWPS PPLLSSSSPP
             VVDDDGGPDD DDHHDPVVIY PPTYYRLPDD DLIIDSSSPL PTLLLLWWPS PPLLSSSSPP

             AAVL-AMMRR RREGIRRR-- ----LLHHLL LMSSCIEEEE EAAAASADAA AAAAVVVAAG
             PPLTESPPTT KKPDSEEEDF FLLELLKKAA IYDDCISSSS SDDNASLQEE ESSELLLDP-
             PPLTESPPTT KKPDSEEEDF FLLELLKKAA IYDDCISSSS SDDNASLQEE ESSELLLDP-

             GRVVHHTTAF PPSVVAPPPP PTTDDDDAEH HHFL--HHFY YYEEAAYLKF FAFTANQAAI
             -RFFYYTEAS PPSAATSSSS SSSSSSSSTE EELILSKKLN NNDDAAYSKF FALTANQAAI
             -RFFYYTEAS PPSAATSSSS SSSSSSSSTE EELILSKKLN NNDDAAYSKF FALTANQAAI

             AAAFHGGGDD DDDVHHVIDD DFFSLMQQGG LQWWALIQQA AALAAAARR- -LITGGGPPP
             AAATEKKKNN NNNIHHIVDD DFFGIVQQGG IQWWALLQQA AALAAAARRQ QIVSPPPAAA
             AAATEKKKNN NNNIHHIVDD DFFGIVQQGG IQWWALLQQA AALAAAARRQ QIVSPPPAAA

             PTGRRRRDE- --LRDVGRLA DDDARSSVFS SRRRAANNSL LLEEEVRRPW WQQQIAAPGA
             LGESSSSPEP SSLIATGRLR DDDAKVVLFD DIIILL--PI IILLLLNNGS SRRRVDDPDA
             LGESSSSPEP SSLIATGRLR DDDAKVVLFD DIIILL--PI IILLLLNNGS SRRRVDDPDA

             AFFNVVQLRR LLLGDPADDQ QAPPP---ID DDAACCVASV RRPVVVVIIE EAADHNNKTG
             AVVNMMQLKK LLL----DDE ETPPPTTIVD DDTTLLAKSL NNPLLLLGGE EVVSLNNRVG
             AVVNMMQLKK LLL----DDE ETPPPTTIVD DDTTLLAKSL NNPLLLLGGE EVVSLNNRVG

             DRRRRFFFLL LFYYYSAVFD LAAAGGAAGG GAAAE-AYYQ QQRREDDIVC GGG-RERRRH
             NRRRRVVVLL LQFYYSAVFE LNNGDDEEEE EVVVERELLG GGRRRGGLIG PPKIRERRRM
             NRRRRVVVLL LQFYYSAVFE LNNGDDEEEE EVVVERELLG GGRRRGGLIG PPKIRERRRM

             HEPWWRRRRL LLSNLLLLLG E-HVEAADDD GCLTTTTTLL GGWHGRPFFA ASWWWWEAAA
             MEEWWRRRLL LLNYVLLNNY SYSVSKKPPP GFISSSSSLL AAWNDLPLLL LSWWWWR---
             MEEWWRRRLL LLNYVLLNNY SYSVSKKPPP GFISSSSSLL AAWNDLPLLL LSWWWWR---

             GDDNNNNNNN SSSSSSGNSG GGSGSSGGRD DGSSSSCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --------MT FPFQ------ -------AAS
t5g66770.1   MMMYMCCCTT DSGLMMAAAA IQQVVVKQQK QQQQQQQHHI FGINPLLSSS LLLNNPPSSG
t5g66770.2   MMMYMCCCTT DSGLMMAAAA IQQVVVKKKK QQQQQQQHHI FGINPLLSSS LLLNNPPSSG

             SSGGLDAGGG PVV------- ---------- -------DGG VY-------- ----------
             FFGGLSGSSS PVVTTGGGGD SDDFFPFFFN NNDHHHHTGG GFLDFFGGTG GGGGGFEDDE
             FFGGLSGSSS PVVTTGGGGD SDDFFPFFFN NNDHHHHTGG GFLDFFGGTG GGGGGFEDDE

             ---------- ---------Y PAA---GGDD DD-------- ---------- AAAAEFAAAF
             WEELSGDDSS AADCDDTTTH PDDVIYGGDD DDPPFFTYYP PSSRLVQQSL SSSSPPPPLW
             WEELSGDDSS AADCDDTTTH PDDVIYGGDD DDPPFFTYYP PSSRLVQQSL SSSSPPPPLW

             FAAAAVL--- AARRRREEEE EEEEVAIIIR -------LVV HLLMCCAAAA AAAADHHHAS
             WSSSPLTEEE SSTTKKEEDP PEEETNSSSE DDFDDDELLL KIIYCCRDDD DDDDDPPPNS
             WSSSPLTEEE SSTTKKEEDP PEEETNSSSE DDFDDDELLL KIIYCCRDDD DDDDDPPPNS

             SAQLAADALA AVAASGGGRR RVAVVVHFTT LSRRRLFFPP PPPTDDEEFL L-YFYYEPYK
             SKTLLLQEVS ELDPT--ERR RVAFFFYFEE LSNNRLSSPP SSSSSSTTLI ISYLNNDPYK
             SKTLLLQEVS ELDPT--ERR RVAFFFYFEE LSNNRLSSPP SSSSSSTTLI ISYLNNDPYK

             FAFAANNQLL EEEAAFFGGD DHHHVVVDDD FLMMQGQQPP ALLIQALLLG GGPPF--LIT
             FALAANNQLL EEEAATTKKN NKKKIIIDDD FIVVQGQQPP ALLLQALLTG GGKPTQQIVS
             FALAANNQLL EEEAATTKKN NKKKIIIDDD FIVVQGQQPP ALLLQALLTG GGKPTQQIVS

             GIIGGPPPPG DDEE--LRRD VVLLLLRRRR RLDLRFFSFV SSSEEVLLQA AAPGGEAAAS
             GIIPPAPLLE PPEESSLIIA TTNNNNRRRR RLDFDFFDFI PPPLLLFFRD DDPDDEVAAF
             GIIPPAPLLE PPEESSLIIA TTNNNNRRRR RLDFDFFDFI PPPLLLFFRD DDPDDEVAAF

             VVLLLLLRLL LGPDAP---- -IAAAVLDCC VSVVRRPIIF TEEADTGLDD DRTTALYYSA
             MMLLLLLKLL L--DTPTTTI IVTTTALRLL ASLLNNPVVV TEEVSVGANN NRKKALYYSA
             MMLLLLLKLL L--DTPTTTI IVTTTALRLL ASLLNNPVVV TEEVSVGANN NRKKALYYSA

             AVFFSLLDAA ASAASSGGGA ANMAAAE-AY LLRREEEDIV GAAAA-ERRE PLRRDLLTTR
             AVFFSLLEPP NLGGRRDDSE ERRVVVEREL FFRRRRRGLI KTTTGIERRE EKQQVMMEEN
             AVFFSLLEPP NLGGRRDDSE ERRVVVEREL FFRRRRRGLI KTTTGIERRE EKQQVMMEEN

             LSSSPLSNAL LRALLGGLFF FSSSGGEEGG ---EADCTTL LGWGPPLLFS SSASAADDGG
             FEEEKLNYAV VSALLWWNYY YNNNYYSSNN YYYSKPFSSL LAWDPPLLLT TTLS------
             FEEEKLNYAV VSALLWWNYY YNNNYYSSNN YYYSKPFSSL LAWDPPLLLT TTLS------

             GGDNNSNSSS NSSSSSNNNN GSSGKKKGAA GGSSVCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------MDDD TTPFFF---- --WWMPASSS
t5g66770.1   MMAMMCTTTT DDGNNLMMAA QQQEQQQQQQ HHHHQQHQQQ IIGIIIPPLL LNWW-TSGFF
t5g66770.2   MMAMMCTTTT DDGNNLMMAA QKQEQQQQQQ HHHHQQHQQQ IIGIIIPPLL LNWW-TSGFF

             SGAGFLPPPP AAV------- ---------- --------PP GVV------- ----------
             FGGSAFPPPP QQVGGGGGGG DDSSSSDDDF FNNLLHHHTT GGGSFFGTTT GGGGGFFESD
             FGGSAFPPPP QQVGGGGGGG DDSSSSDDDF FNNLLHHHTT GGGSFFGTTT GGGGGFFESD

             ---------- -------YYD DPPA--GGAA DD-------- ---------- -DAAPAAAAF
             WTTLISGSVP PCDDTWWHHD DNNDYIGGPP DDPFFFTTYS RLSSSVSLLL VDSSLTTTLW
             WTTLISGSVP PCDDTWWHHD DNNDYIGGPP DDPFFFTTYS RLSSSVSLLL VDSSLTTTLW

             PCCCCAPPPD DAAAA-ARRE EEEEEEEVGI IIRR------ -VLLLLLMMM SCAAAIIEAH
             PSSSSSPPPL LPPPPHSKKE EEDDEEETDS SSEEDFDLLE PLAIIIIYYY DCARRIISDP
             PSSSSSPPPL LPPPPHSKKE EEDDEEETDS SSEEDFDLLE PLAIIIIYYY DCARRIISDP

             AALAAQSSHH AAAVSSASGR VAFALSRRRR SSSSPPPPPP PPTTTDDHAF --YHHHYCPP
             NNEAKTIIRR SSSLGGDTER VAFALSNNNR SSSSPPSSSS SSSSSSSEDL LSYKTTNCPP
             NNEAKTIIRR SSSLGGDTER VAFALSNNNR SSSSPPSSSS SSSSSSSEDL LSYKTTNCPP

             PPYYLLLAAF ANLEFDHVVH VVIDDFSSSL LMMGGLLQQW PLLLIRPGGG GPP--LLRRI
             PPYYSSSAAL ANLETNKIIH IIVDDFGGGI IVVGGIIQQW PLLLLRTSGG GKPQQIIRRV
             PPYYSSSAAL ANLETNKIIH IIVDDFGGGI IVVGGIIQQW PLLLLRTSGG GKPQQIIRRV

             TGGPPTGGRR EELDDVLLLR DDDASVVVRV VFFFSSRRGA ANRPWWMMML LIAAAPAVVA
             SGGALGEESS EELAATNNNR DDDAVLLLDL LFFFDDIIPL L-NGSSSSSF FVDDDPVLLA
             SGGALGEESS EELAATNNNR DDDAVLLLDL LFFFDDIIPL L-NGSSSSSF FVDDDPVLLA

             ANSSSSSVVL QLHRLLLDDQ AAAP--IIVL CVAAASSPKI EEEEADDNTT GLDRREAFYY
             ANFFFFFMML QLYKLLL-DE TTTPTIVVAL LAKKKSSPRG EEEEVSSNVV GANRRNAQFY
             ANFFFFFMML QLYKLLL-DE TTTPTIVVAL LAKKKSSPRG EEEEVSSNVV GANRRNAQFY

             SVVDDSSSDA ASSSGAAMMA AYLQQRDICE AARREERRRR HEPPPSRWWR DDDLTRGLSS
             SVVEESSSEP GRRRSVVRRV ELFGGRGLGE TTRREERRRR MEEEEEQWWR VVVMENGFEE
             SVVEESSSEP GRRRSVVRRV ELFGGRGLGE TTRREERRRR MEEEEEQWWR VVVMENGFEE

             AVLGNNRRRQ ARRRMLLLLG GLLFGGE--H HSECLLTLLG WWHGRPFFFA SWEEAAGDGG
             SVLSYYSSSQ AKKKILLLLW WNNYYYSLYS SIEFIISLLA WWNDLPLLLL SWRR------
             SVLSYYSSSQ AKKKILLLLW WNNYYYSLYS SIEFIISLLA WWNDLPLLLL SWRR------

             DDNNNSNNSS VSSSSGGGGS SDSSSGSSSN GGKKSRDDG
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------MMMT PPFF-----W PPMDAASGGD
t5g66770.1   MAMCTTTGGG GNLAIAAAQQ QQKKQQEQQQ QHHHQDHHHI GGIIPLLPPW PP-NSLGGGS
t5g66770.2   MAMCTTTGGG GNLAIAAAQQ QQKKKQEQQQ QHHHQDHHHI GGIIPLLPPW PP-NSLGGGS

             AAGFFPPPPA V--------- ---------- ------DDDG GGGGY----- ----------
             GGSAAPPPPF VTTTGGGDDS PPPPPGFFFP PNDHHHTTTG GGGGFRLDGG GTTGGESDEW
             GGSAAPPPPF VTTTGGGDDS PPPPPGFFFP PNDHHHTTTG GGGGFRLDGG GTTGGESDEW

             ---------- -------DPP AA-----D-- ---------- ---------- --AAAPPEFF
             WLISSSGGDS SVDDPDWDNP DDYYVIIDDP PSRRRLLSVV VQQQQPPPSL NRTTTLLPPP
             WLISSSGGDS SVDDPDWDNP DDYYVIIDDP PSRRRLLSVV VQQQQPPPSL NRTTTLLPPP

             AAFPCAPAAV VL-AARRRRE EEEEVAGRRR ---------- -LHLLLLLMC CCGGAADAAL
             PPWPSSPSSL LTESSTKKKE PPPETNDEEE DDDDDFLEPP PLKAIIIIYC CC--RDDNNE
             PPWPSSPSSL LTESSTKKKE PPPETNDEEE DDDDDFLEPP PLKAIIIIYC CC--RDDNNE

             SLASSHHAAA LLAASSAASI VAAVVFLSSL LFPPVAPPTT TDDAHAFFL- -HYEAAAPPY
             SLLIIRREEE VVSEGGDPT- VAAFFFLSSL LSPPATSSSS SSSSEDLLIL LTNDAAAPPY
             SLLIIRREEE VVSEGGDPT- VAAFFFLSSL LSPPATSSSS SSSSEDLLIL LTNDAAAPPY

             KFFAHHFTQA AIILHGCCDH HVVSLMQGLW WPLLIQLLRR PGGGPPPLTT GIIGPPPSDD
             KFFAHHLTQA AIILEKSSNK KIIGIVQGIW WPLLLQTTRR TSGGKKKISS GIIPAAPSPP
             KFFAHHLTQA AIILEKSSNK KIIGIVQGIW WPLLLQTTRR TSGGKKKISS GIIPAAPSPP

             EEEE---LVG LRDLLAAARR RRVVVRRVFS FFRVAAANDE VPMLLGGAAN NNNSSVHHRL
             EEEEPPPLTG NRDFFAAAKK KKLLLDDLFD FFIITTT-HL LGSFFDDVAN NNNFFMYYKL
             EEEEPPPLTG NRDFFAAAKK KKLLLDDLFD FFIITTT-HL LGSFFDDVAN NNNFFMYYKL

             LLLLGGGDPA DAIDAAAAAA RPKIVVVVII EQEDDDHNKK KKTTTTTGLD RFFFEAAYYY
             LLLL------ DTVDTKKKKK NPRVLLLLGG EYESSSLNRR RRVVVVVGAN RVVVNAAFYY
             LLLL------ DTVDTKKKKK NPRVLLLLGG EYESSSLNRR RRVVVVVGAN RVVVNAAFYY

             AVDDSDDASS GGGAGNNAAM MMAE----QI ICCICGGGGA AA--RERRHE PLRRRLLLTR
             AVEESEEPLR DDDEERRVVR RRVERRRRGI ISSLGPPKKT GGIIRERRME EKQRRMMMEN
             AVEESEEPLR DDDEERRVVR RRVERRRRGI ISSLGPPKKT GGIIRERRME EKQRRMMMEN

             RGAVLLLLGN AALLLAARRM MMLGGGGFGH HSEGGCCLLW HGRPLFFSWW WWWAGGGDGG
             NGSVLLLLSY AAVVVAAKKI IILWWWWYNS SISGGFFILW NDLPLLLTWW WWW-------
             NGSVLLLLSY AAVVVAAKKI IILWWWWYNS SISGGFFILW NDLPLLLTWW WWW-------

             GDDNNNSSSN VSSSDDSSSS NSGGSSSGKS ARDDGSCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -MTFFFPPQQ ---WPPASSL
t5g66770.1   MMYDSSGGNL LMMAAAQQQQ QQKQQQKQQQ QQQEQQQQQH QHIFFFGGNN LLPWTTLGFL
t5g66770.2   MMYDSSGGNL LMMAAAQQQQ QQKKKKKQQQ QQQEQQQQQH QHIFFFGGNN LLPWTTLGFL

             LAGGFLLPAA V--------- ---------A ADGGG----- ---------- ----------
             LGSSAFFPFQ VTTTTGGDSD GGFPFPDDHA ATGGGRLSFF GTTGFFFSEE LISGGDSAAG
             LGSSAFFPFQ VTTTTGGDSD GGFPFPDDHA ATGGGRLSFF GTTGFFFSEE LISGGDSAAG

             --DPPPP--- --GAA----- ---------- -----VDDDA AAALLPFFAA AAAAFFPPPC
             PTDNNPPVVI YYGPPPFDTY YSSSLLLSVQ LLLNRIDDDT SSSPPLPPPT TTLLWWPPPS
             PTDNNPPVVI YYGPPPFDTY YSSSLLLSVQ LLLNRIDDDT SSSPPLPPPT TTLLWWPPPS

             CAAPPDDAAV -REEEEAAAG R--------- --LLHHHLLL LCCAGAGGDH HALSAQLASS
             SSSPPLLSSL HKEDPENNND EDDDDDLLEE EPLLKKKAII ICCA-RSSDP PNESKTLLII
             SSSPPLLSSL HKEDPENNND EDDDDDLLEE EPLLKKKAII ICCA-RSSDP PNESKTLLII

             HAAAAALLAV VSSAIIIGGR RVAVVTTAAL RRRRRLFFP- PPPADAAL-Y YYHHACPPYF
             REEESSVVSL LGGP---EER RVAFFTEAAL NRRRRLSSPN PPPTSSDILY YYTTACPPYF
             REEESSVVSL LGGP---EER RVAFFTEAAL NRRRRLSSPN PPPTSSDILY YYTTACPPYF

             FHHTAANQAI EEFHHHHGCV HFLLLQQLLQ PLIIALALLP GGPPFFFRRI GGIGPPSSSG
             FHHTAANQAI EETEEEEKSI HFIIIQQIIQ PLLLALATTT SSKPTTTRRV GGIPAASSSE
             FHHTAANQAI EETEEEEKSI HFIIIQQIIQ PLLLALATTT SSKPTTTRRV GGIPAASSSE

             GGRDD----L RRDGLLRRRL ADDAARSVRR RFSSRGVAAN EVMLIAPGGG EEVVVLQQHL
             EESPPPPPPL IIAGNNRRRL RDDAAKVLDN NFDDIPILT- LLSFVDPDDD EELLMLQQYL
             EESPPPPPPL IIAGNNRRRL RDDAAKVLDN NFDDIPILT- LLSFVDPDDD EELLMLQQYL

             GGDPADQQQA AAP-AAVVCC VAVVPPPPIT VVEAAAAADH HNNKKTGGFD DDRFFALFYS
             -----DEEET TTPTTTAALL AKLLPPPPVT LLEVVVVVSL LNNRRVGGFN NNRVVALQYS
             -----DEEET TTPTTTAALL AKLLPPPPVT LLEVVVVVSL LNNRRVGGFN NNRVVALQYS

             AAASGGGNAA -AAYLRRRRC CCDVVVEEGG GAAA---RRR REEERHEEPL LRWWRLTTTA
             PPNRDEERVV REELFRRRRS SSGIIIEEKK KTTGIIIHHH REEERMEEEK KQWWLMEEEA
             PPNRDEERVV REELFRRRRS SSGIIIEEKK KTTGIIIHHH REEERMEEEK KQWWLMEEEA

             AGAAAPGGGA AALLLLRRQR VE---HHSSE EAAALTGGGW WHHHGRPPSS SASSSWAGGG
             AGSSSKSSSA AAVVVVSSQK LSLLYSSIIE SKKKISAAAW WNNNDLPPTT TLSSSW----
             AGSSSKSSSA AAVVVVSSQK LSLLYSSIIE SKKKISAAAW WNNNDLPPTT TLSSSW----

             DDDNNSNSSV GGSSSGGGGS SSSSNNNSGG SNGKGAGSV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -------DDQ Q-------WW WPPMASSSSG
t5g66770.1   MAYMMCCCTD SNMAIAAQVI KKKKQQQQQQ QEQQQDDQQN NPPPSLNNWW WPP-SGGFFG
t5g66770.2   MAYMMCCCTD SNMAIAAQVI KKKKQQQQQQ QEQQQDDQQN NPPPSLNNWW WPP-SGGFFG

             LAALLLLPPP V--------- ---------- ---------- DVVGY----- ----------
             LGGFFFFPDP VGGGGGGDDS NNNDPPPPPN LLDHHHHHHH TGGGFRLSDG GGGGGEFEES
             LGGFFFFPPP VGGGGGGDDS NNNDPPPPPN LLDHHHHHHH TGGGFRLSDG GGGGGEFEES

             ---------- ---------- --------YD DPPPA--GD- ---------- --AFFAAAFP
             DWMEETIIGG GDDSSVVVAA ADDGGTWWHD DPPPDVVGDF DTPPLVPSSS DNSPPPPTWP
             DWMEETIIGG GDDSSVVVAA ADDGGTWWHD DPPPDVVGDF DTPPLVPSSS DNSPPPPTWP

             PPPCCCCCAD AAALL--RRR EEEEEEVVVA AAGR------ -------HLL MSSCAGIEEE
             PPPSSSSSSL SIPTTHETTT DPPEEETTTN NNDEDDDDDF FDDDDDLKAI YDDCA-ISSS
             PPPSSSSSSL SIPTTHETTT DPPEEETTTN NNDEDDDDDF FDDDDDLKAI YDDCA-ISSS

             GHAASAQQAD DSSSHAAVAA SSSIIRVVAF TTLLSRRRRL LFFPP-SVVV APPPPTEHAF
             SPAASKTTLQ QIIIRSELDP TTT--RVVAF TELLSNRRRL LSSPPNSAAA TSSSSSTEDL
             SPAASKTTLQ QIIIRSELDP TTT--RVVAF TELLSNRRRL LSSPPNSAAA TSSSSSTEDL

             L-YYACYLKK AAHFFTNNQA FHHGGGCCHH VIDSMQQQWP IQQLRRRRPP GGPPPLLRII
             ILYYACYSKK AAHLLTNNQA TEEKKKSSKK IVDGVQQQWP LQQTRRRRTT SGKKKIIRVV
             ILYYACYSKK AAHLLTNNQA TEEKKKSSKK IVDGVQQQWP LQQTRRRRTT SGKKKIIRVV

             TTGGPSSPGD DE--LLLVVG GLLLLLLAAR SVRVFFSSRR GGASDEEVRP WLQQQQIIAA
             SSGPASSLEP PESSLLLTTG GNNLLFFAAK VLDLFFDDII PPTPHLLLNG SFRRRRVVDV
             SSGPASSLEP PESSLLLTTG GNNLLFFAAK VLDLFFDDII PPTPHLLLNG SFRRRRVVDV

             VVAFNSLLQR RLGDPAQQPP -IIDVVVLCV APKFTEEEAN KTTTLLFFYY VVFDSSSSLD
             LLAVNFLLQK KL----EEPP TVVDAAALLA KPRVTEEEVN RVVVLLQQFY VVFESSSSLE
             LLAVNFLLQK KL----EEPP TVVDAAALLA KPRVTEEEVN RVVVLLQQFY VVFESSSSLE

             AAAGAAANNA AAAMEE-LLL QQREICDIVV CEGGAAA--- RRERELSSSR DLTRRAGLSS
             PNNDEEERRV VVVREERFFF GGRRISGLII GEKKTGGIII HREREKEEEQ VMENNAGFEE
             PNNDEEERRV VVVREERFFF GGRRISGLII GEKKTGGIII HREREKEEEQ VMENNAGFEE

             AAAVVPSNNR QMVGFSEEG- HHHGCCTTLL LWHHGGGRRR PFSSASSWEE EGDGGGNNNN
             SSSVVKNYYS QILWYNSSNL SSSGFFSSLL LWNNDDDLLL PLTTLSSWRR R---------
             SSSVVKNYYS QILWYNSSNL SSSGFFSSLL LWNNDDDLLL PLTTLSSWRR R---------

             NNNNNVGGGG GGGSDDDSSN NSGSNGSSSA RDGSSCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ---DPP---- ----WMDDPA
t5g66770.1   MMMAAAMMDS SGLMMAAAQQ QQIIIIQKQQ QEEEQQQQQH HHHQGGPPSS LLLPW-NNTL
t5g66770.2   MMMAAAMMDS SGLMMAAAQQ QQIIIIKKQQ QEEEQQQQQH HHHQGGPPSS LLLPW-NNTL

             AASGGLLDDA GFLLPPPPPA AVV------- ---------- ----AAPDVG Y---------
             LLGGGLLSSG SAFFPPPPPF QVVTGGGDSS SPPFFNNLDD DHHHAATTGG FDFGGGGGFS
             LLGGGLLSSG SAFFPPPPPF QVVTGGGDSS SPPFFNNLDD DHHHAATTGG FDFGGGGGFS

             ---------- ------YPA- -----GGAD- ---------- ---------- -VVDAAAAAL
             SEELIGGSVD GDTTWWHPDY YYVVIGGPDP PDTYYSSRRL LSVQQPDNRV VIIDTTSSSP
             SEELIGGSVD GDTTWWHPDY YYVVIGGPDP PDTYYSSRRL LSVQQPDNRV VIIDTTSSSP

             LLFAAADDAA AA-AMRRREE EEEEEEG--- ---LLVVHHL LSCCAAAEEG DASQLAADDH
             PPPPTLLLPP PPHSPTTKEE EDDEEEDDDD FDELLLLKKA IDCCRRRSSS DNSTLLLQQR
             PPPPTLLLPP PPHSPTTKEE EDDEEEDDDD FDELLLLKKA IDCCRRRSSS DNSTLLLQQR

             HHAAVSAAAI GVVAVHHTTS RLPP---PPP TTTDDDAEF- YYHHHFFFYY PPKKHTTNQI
             RRESLGDDP- EVVAFYYTTS NLPPNNNPPS SSSSSSSTLL YYTTTLLLNN PPKKHTTNQI
             RRESLGDDP- EVVAFYYTTS NLPPNNNPPS SSSSSSSTLL YYTTTLLLNN PPKKHTTNQI

             LLEECHHHHV VDFSSMQLLL QQPLIQAARP GGGGPPLRRR IIIIIGGGPP PPTGGRE-DR
             LLEESKKKHI IDFGGVQIII QQPLLQAART GGGGPPIRRR VVVIIPPPPL LLGEESESAR
             LLEESKKKHI IDFGGVQIII QQPLLQAART GGGGPPIRRR VVVIIPPPPL LLGEESESAR

             LADLAASRRR RRFSSRRRGG AAAASSLERR RRWQQAPGGE AAVVAAFNSS SVHHHRLLGG
             LRDFAAVDNN NNFDDIIIPP LLLLPPILNN NNSRRDPDDE VVLLAAVNFF FMYYYKLL--
             LRDFAAVDNN NNFDDIIIPP LLLLPPILNN NNSRRDPDDE VVLLAAVNFF FMYYYKLL--

             GDDQAPP-ID ALLLVVVSSR PPPPKITTQN NNNTGLDDDR ALFYAVVVFS LDASAAAAAE
             -DDETPPTVD TLLLAAASSN PPPPRVTTYN NNNVGANNNR ALQYAVVVFS LENLEVVVVE
             -DDETPPTVD TLLLAAASSN PPPPRVTTYN NNNVGANNNR ALQYAVVVFS LENLEVVVVE

             --AAAAYYLR EECDICCEAA EEERREPSSS RRDDDRRRLL TGGGLLVPLL LGSSSNLQQQ
             RREEEELLFR RRSGLGGEGG EEERREEEEE RRVVVLLLMM EGGGFFVKLL LSNNNYVQQQ
             RREEEELLFR RRSGLGGEGG EEERREEEEE RRVVVLLLMM EGGGFFVKLL LSNNNYVQQQ

             AARRRRMLVG GLLFGGGGGE G--SVVVEEE EAAAAGGGLT GGHGGGPPSS ASSAAAAAAG
             AAKKKKILLW WNNYYYYYYS NLYIVVVESS SKKKKGGGIS AANDDDPPTT LSSSS-----
             AAKKKKILLW WNNYYYYYYS NLYIVVVESS SKKKKGGGIS AANDDDPPTT LSSSS-----

             GGGGDDNNNS NSVGSGGSSS SNSSSSNGKK ARRDGSVCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -MDDDFFQQQ ---WPMDDAS SGLLGFLPPP
t5g66770.1   MAYYYYMCTS LLLAAIQIKQ QQEQQQHQQQ DHQQQFINNN LNPWP-NNLF FGLLSAFPDD
t5g66770.2   MAYYYYMCTS LLLAAIQIKQ QQEQQQHQQQ DHQQQFINNN LNPWP-NNLF FGLLSAFPPP

             PPAV------ ---------- ---AADDDDV V--------- ---------- ----------
             PPQVTGGGGS SDDFPFFFHH HHHAATTTTG GRLLSDDDFG TTTGGGEFFE EESDEEEISS
             PPQVTGGGGS SDDFPFFFHH HHHAATTTTG GRLLSDDDFG TTTGGGEFFE EESDEEEISS

             ---------- ---------Y DDDDPPAA-- ---GGGD--- ---------- ----DDPPAA
             GGGSSVVAAD DGGGPPDWWH DDDDPPDDYV IIIGGGDPFF DTTPLSSVQQ RVVVDDLLPP
             GGGSSVVAAD DGGGPPDWWH DDDDPPDDYV IIIGGGDPFF DTTPLSSVQQ RVVVDDLLPP

             AAFPPPPCAD DVL--AMMMR EEEEEI---- ---LLHLLSS CCAAAGIEAG GDHHALAAQL
             TLWPPPPSSL LLTHHSPPPT EDPPESDFDL EEPLLKAIDD CCAAA-ISDS SDPPNEKKTL
             TLWPPPPSSL LLTHHSPPPT EDPPESDFDL EEPLLKAIDD CCAAA-ISDS SDPPNEKKTL

             DDSAALVVVS AAAAASSIGG RVAAFFTTTL LLRRRLLLLF PPPPVVAAPT DAEEF----Y
             QQIEEVLLLG DDPPPTT-EE RVAAFFEEEL LLNRRLLLLS PPPPAATTSS SSTTLLLSSY
             QQIEEVLLLG DDPPPTT-EE RVAAFFEEEL LLNRRLLLLS PPPPAATTSS SSTTLLLSSY

             HECCCPYYLF HFFFTTTAAN NILAAAFHCD VHHVDDFLLQ LLWWLIIIQQ QAALLALRPG
             TDCCCPYYSF HLLLTTTAAN NILAAATESN IHHIDDFIIQ IIWWLLLLQQ QAALLATRTS
             TDCCCPYYSF HLLLTTTAAN NILAAATESN IHHIDDFIIQ IIWWLLLLQQ QAALLATRTS

             PPPFFFFLRI GIGGGPPPPP TDRDDVLADD SSVRVVRFFG GGANNSSSLD DEVRRRLLQA
             KPPTTTTIRV GIPPPPPPPL GPIAATLRDD VVLDLLNFFP PPL--PPPIH HLLNNNFFRD
             KPPTTTTIRV GIPPPPPPPL GPIAATLRDD VVLDLLNFFP PPL--PPPIH HLLNNNFFRD

             AGGEAAAFNS VLHLLLDPAA DQAA---IIV LDDCVVAVVP KIIVEEADHH NKTFFDDFTA
             DDDEVAAVNF MLYLLL---- DETTTIIVVA LRRLAAKLLP RVVLEEVSLL NRVFFNNVKA
             DDDEVAAVNF MLYLLL---- DETTTIIVVA LRRLAAKLLP RVVLEEVSLL NRVFFNNVKA

             LLLLFYYDDA AAGAGNAM-- -AAAQQQQRR EICCCGEEAA RRHLLSSSRR RRRDRLLGGL
             LLLLQYYEEN NGSEERVRRR REEEGGGGRR RISGGPEETT RRMKKEEEQQ RRRVLMMGGF
             LLLLQYYEEN NGSEERVRRR REEEGGGGRR RISGGPEETT RRMKKEEEQQ RRRVLMMGGF

             LLSSAVVVVP LLLNLLAARM MLLVGLGG-H HEADGLTTGW GGGPPLLFSA SSSAEGDDGN
             FFEESVVVVK LLLYVVAAKI ILLLWNYNLS SEKPGISSAW DDDPPLLLTL SSSSR-----
             FFEESVVVVK LLLYVVAAKI ILLLWNYNLS SEKPGISSAW DDDPPLLLTL SSSSR-----

             NNSSSNNSNN VGSGSDDSNS SSSGGKSSSR RDDDDSVVC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -------MDF PFQQ------ ---WWPDDDP
t5g66770.1   MMMTSSGGNL LLLMMMAQQQ QVVKKQQQQQ QQHHHQDHQF GINNPPLSNN NPPWWPNNNT
t5g66770.2   MMMTSSGGNL LLLMMMAQQQ QVVKKKQQQQ QQHHHQDHQF GINNPPLSNN NPPWWPNNNT

             AAAASSGGLL PAAV------ ---------- -----AAPDG GGV------- ----------
             SLLLGFSSFF PQQVGDSSSN NGFPPFFFPD HHHHHAATTG GGGRLSSFFG GGGGGEESSE
             SLLLGFSSFF PQQVGDSSSN NGFPPFFFPD HHHHHAATTG GGGRLSSFFG GGGGGEESSE

             ---------- ---------- ------YYYP A----AAD-- ---------- ----------
             MMEEETTTTL LGGSSVADGG DCDDTTHHHP DVVVYPPDPP PFFFTTSSSR RSQPSSDDLR
             MMEEETTTTL LGGSSVADGG DCDDTTHHHP DVVVYPPDPP PFFFTTSSSR RSQPSSDDLR

             VDAALLEEEP PCAADAAAAV LLL---AREE EEEEVAGIIR R----LVHHL LSSAAAGIEE
             IDTSPPPPPP PSSSLIIPPL TTTHEESKED DPEETNDSSE EDDEPLLKKA IDDAAA-ISS
             IDTSPPPPPP PSSSLIIPPL TTTHEESKED DPEETNDSSE EDDEPLLKKA IDDAAA-ISS

             AAAGGHSSAL DDAAVAAASG IGGVAAAVVV HHHTTALSRR RLLF---SSS VAAPPTTTTD
             DDDSSPSSKL QQEELDPPT- -EEVAAAFFF YYYTEALSNR RLLSNNNSSS ATTSSSSSSS
             DDDSSPSSKL QQEELDPPT- -EEVAAAFFF YYYTEALSNR RLLSNNNSSS ATTSSSSSSS

             DAAAHHAL-Y HEEACCYLLL KFAFFAAQAA AIIEFHGDDH HVHVVIIDFM MQWPALLIIA
             SSSSEEDILY TDDACCYSSS KFALLAAQAA AIIETEKNNK KIHIIVVDFV VQWPALLLLA
             SSSSEEDILY TDDACCYSSS KFALLAAQAA AIIETEKNNK KIHIIVVDFV VQWPALLLLA

             AALPPPPPF- LRRIIITGII GGPPSSSTTE EEEDDVVVGR LAAAASSVRF SSSFFAAASL
             AATTTKPPTQ IRRVVVSGII PPAPSSSGGE EEEAATTTGR LAAAAVVLNF DDDFFLLTPI
             AATTTKPPTQ IRRVVVSGII PPAPSSSGGE EEEAATTTGR LAAAAVVLNF DDDFFLLTPI

             DEEEVVRRPW WGEAVANSSS VVQQHRLLLL LLDPQQQQQP -IVVVVVLLD DVSVRPPKKF
             HLLLLLNNGS SDEVLANFFF MMQQYKLLLL LL--EEEEEP TVAAAAALLR RASLNPPRRV
             HLLLLLNNGS SDEVLANFFF MMQQYKLLLL LL--EEEEEP TVAAAAALLR RASLNPPRRV

             TVVVEQADKT FLREALYYSS AAFSLLDDDA AANAAMAEEE AYYYYICIIC GGEGAA---R
             TLLLEYVSRV FARNALFFSS AAFSLLEEEP NERVVRVEEE ELLLLISLLG PPEKTGIIIH
             TLLLEYVSRV FARNALFFSS AAFSLLEEEP NERVVRVEEE ELLLLISLLG PPEKTGIIIH

             EERHLRWDTA GSSVVPGSAL RRRRAALVFS GGGEGGG--- --HHSEEEAD GGCCLWWHRL
             EERMKQWVEA GEEVVKSNAV SSSSAALLYN YYYSNNNLLL YYSSISSSKP GGFFLWWNLL
             EERMKQWVEA GEEVVKSNAV SSSSAALLYN YYYSNNNLLL YYSSISSSKP GGFFLWWNLL

             AAWEAAGGGG GDNNSVSSSS DDNGSNGKSS GDGSCCLLL
             LLWR------ ---------- ---------- ---------
             LLWR------ ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ------MDTT Q----PDASG
t5g66770.1   YCTGGGNNNN LMAAAIIAAQ QQQQQVVVKQ QQQQQQEQQQ QQHHQDHQII NPLLNPNSFG
t5g66770.2   YCTGGGNNNN LMAAAIIAAQ QQQQQVVVKK QQQQQQEQQQ QQHHQDHQII NPLLNPNSFG

             DDGGFFLPPP PPPVV----- ---------- ----APDDDV Y--------- ----------
             SSSSAAFPPP DDPVVTTTGG GGGNFPPFFF PLDHATTTTG FRLLLFFFGG GGTTGEFELL
             SSSSAAFPPP PPPVVTTTGG GGGNFPPFFF PLDHATTTTG FRLLLFFFGG GGTTGEFELL

             ---------- ---------- -YDDD----G GDDD------ ---------- ----------
             IIIIISGGDD VAADGPDDDT WHDDDVVIYG GDDDPPPPSS RRRLLLVQQP PSSDDDLLNR
             IIIIISGGDD VAADGPDDDT WHDDDVVIYG GDDDPPPPSS RRRLLLVQQP PSSDDDLLNR

             DDDAALFAAA AAAFPPPPCC CAAPPAAAVV VLLL----AM REEEEEEVVG GGII------
             DDDTSPPPPP TLLWPPPPSS SSSPPSIILL LTTTHHEESP KEEDDPETTD DDSSDDFFDD
             DDDTSPPPPP TLLWPPPPSS SSSPPSIILL LTTTHHEESP KEEDDPETTD DDSSDDFFDD

             -----LVLLM CCCAAAEDAA AASAALADHA AALAVVSSAA AAGGGRVAAA LSSRLFPPP-
             LEEEPLLAIY CCCAAASDNN NNSKKLLQRE ESVELLGGDD DP--ERFAAA LSSRLSPPPN
             LEEEPLLAIY CCCAAASDNN NNSKKLLQRE ESVELLGGDD DP--ERFAAA LSSRLSPPPN

             -SSSAATEAA FFFL--YHHH HHHHEAPLKF FAFAQILEEH CCDVVHDFFS LLLGGGLLQQ
             NSSSTTSTDD LLLILLYTTT TTTTDAPSKF FALAQILEEE SSNIIHDFFG IIIGGGIIQQ
             NSSSTTSTDD LLLILLYTTT TTTTDAPSKF FALAQILEEE SSNIIHDFFG IIIGGGIIQQ

             QPPAAAQALA ALPGGPPF-L RRIGIGTRDE ERGLLAARSS SVRVRSSFVA ASLLLDDEVP
             QPPAAAQALA ATTSGKKTQI RRVGIPGSPE EIGLLAAKVV VLDLNDDFIL TPIIIHHLLG
             QPPAAAQALA ATTSGKKTQI RRVGIPGSPE EIGLLAAKVV VLDLNDDFIL TPIIIHHLLG

             WWMLAPGGEA AFFVLLLQLH HRLLDDDDDP AADDP--IAV LCAVVRPTVV IQQQEEEAAH
             SSSFDPDDEV AVVMLLLQLY YKLL------ --DDPTIVTA LLKLLNPTLL GYYYEEEVVL
             SSSFDPDDEV AVVMLLLQLY YKLL------ --DDPTIVTA LLKLLNPTLL GYYYEEEVVL

             HNTTGDDDDR RFTTEAFYYA FFDDSSDAAG GAGGGAEAAY QRIIIIIICG AARRREEHHE
             LNVVGNNNNR RVKKNAQFYA FFEESSEPPS SEEEEVEEEL GRIIIILLGK TGHRREEMME
             LNVVGNNNNR RVKKNAQFYA FFEESSEPPS SEEEEVEEEL GRIIIILLGK TGHRREEMME

             PLWRRRDRRL AGLAVVLGGA AQARRLGLFS SEEGG---SS VEEEAAADDL LLLWGLFSSE
             EKWRRRVLLM AGFSVVLSSA AQAKKLWNYN NSSNNLLYII VESSKKKPPI LLLWDLLTSR
             EKWRRRVLLM AGFSVVLSSA AQAKKLWNYN NSSNNLLYII VESSKKKPPI LLLWDLLTSR

             AADGGGDNNN NNSGDNNNSG KSSSSARRRD DDGGGSSVL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ----MDPFFQ QQ-----DDA
t5g66770.1   MMMYMMMCCT TSSNNLMIIA QQQVVIIIKQ KKKQQEEQQQ HHHHHQGIIN NNPLLNPNNS
t5g66770.2   MMMYMMMCCT TSSNNLMIIA QQQVVIIIKK KKKQQEEQQQ HHHHHQGIIN NNPLLNPNNS

             AAGLAGGFFL PVVV------ ---------- ----APDDDV GYY------- ----------
             SLGLGSSAAF PVVVTGGGGG SSSDDDPFFF HHHHATTTTG GFFRLLSSGG TTTGGGEEMT
             SLGLGSSAAF PVVVTGGGGG SSSDDDPFFF HHHHATTTTG GFFRLLSSGG TTTGGGEEMT

             ---------- ---------Y PPAAA----- GGAD------ ---------- ------VAAA
             SGGGGSSSVV VVDDDTTWWH PPDDDVIYYY GGPDPPFFFD DTYYPRLLSV VSDLLVITTT
             SGGGGSSSVV VVDDDTTWWH PPDDDVIYYY GGPDPPFFFD DTYYPRLLSV VSDLLVITTT

             ALLFAAAAPP PPPPCCCADA AAAVVAREEE EEEEGGGR-- ---------- --LLLLHHSA
             SPPPPTLLPP PPPPSSSSLI PPPLLSTEEE DDPPDDDEDD DDDFFFDLLL PPLLLLKKDA
             SPPPPTLLPP PPPPSSSSLI PPPLLSTEEE DDPPDDDEDD DDDFFFDLLL PPLLLLKKDA

             AGAIAAGDLS QLSHAALAAV VGGRRAHFTL SPP--PAAPP PPPPTTAHFF FFF----YYH
             A-RIDDSDES TLIREEVSSL L-ERRAYFTL SPPNNPTTSS SSSSSSSELL LLLLLSSYYT
             A-RIDDSDES TLIREEVSSL L-ERRAYFTL SPPNNPTTSS SSSSSSSELL LLLLLSSYYT

             HFFFYYECCY YKFHNNNQIL LAHHGGGGGC DDHHHHHHVV IFSSMMLQQP LIQAALLALL
             TLLLNNDCCY YKFHNNNQIL LAEEKKKKKS NNKKKKKHII VFGGVVIQQP LLQAALLATT
             TLLLNNDCCY YKFHNNNQIL LAEEKKKKKS NNKKKKKHII VFGGVVIQQP LLQAALLATT

             LLPGPPFF-L RRIITGIIII PPPSTRDE-- ---RRDVVVG LLLLLRRRLA LAAARRRVVV
             TTTGKPTTQI RRVVSGIIII AAPSGSPEPP PSSIIATTTG NNNNNRRRLR FAAAKKKLLL
             TTTGKPTTQI RRVVSGIIII AAPSGSPEPP PSSIIATTTG NNNNNRRRLR FAAAKKKLLL

             VRFFFRRRAL LLPWLIAAPE EEEAVNNNVQ RRPPPDDQPA VLDAAVRIIF IEEQEEDDHN
             LNFFFIIILI IIGSFVDDPE EEEVLNNNMQ KK---DDEPT ALRKKLNVVV GEEYEESSLN
             LNFFFIIILI IIGSFVDDPE EEEVLNNNMQ KK---DDEPT ALRKKLNVVV GEEYEESSLN

             TGLLLLDDRF TEEALFYYYA AVVFLDAAAA ASGGGAGGAA MAAEE-ALLL QRRREVCEEG
             VGAAAANNRV KNNALQFFYA AVVFLEPNGG GRSSSEEEVV RVVEEREFFF GRRRRIGEEK
             VGAAAANNRV KNNALQFFYA AVVFLEPNGG GRSSSEEEVV RVVEEREFFF GRRRRIGEEK

             ARERHHHELW RRDRLRAAAL PPPLNAQAML VVVLFSG--- -VEEEEDDDG LLLLLLHGSA
             TRERMMMEKW RRVLMNAAAF KKKLYAQAIL LLLNYNYLLY YVEESSPPPG IIIILLNDTL
             TRERMMMEKW RRVLMNAAAF KKKLYAQAIL LLLNYNYLLY YVEESSPPPG IIIILLNDTL

             GDNNNNNNVV SSSSGSDSSN NSGGGNNKSS GADGSSVCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- DTFF------ ------WWPP MDASSGLLLD
t5g66770.1   YMCCTTTNNL MMAAQQQKQQ QQQQQHHDDD QIFIPPPLLL LSSLNNWWPP -NLGGGLLLS
t5g66770.2   YMCCTTTNNL MMAAQQQKQQ QQQQQHHDDD QIFIPPPLLL LSSLNNWWPP -NLGGGLLLS

             AAAGLPPPAA VV-------- ---------A ADDGGGY--- ---------- ----------
             GGGSFPPDFQ VVGGDSSSPG PPPFFPLHHA ATTGGGFRSF FFGGGGGTGG EFSDEWWMME
             GGGSFPPPFQ VVGGDSSSPG PPPFFPLHHA ATTGGGFRSF FFGGGGGTGG EFSDEWWMME

             ---------- -----PPA-- ----D----- ---------- ---------D DAAALLLLPF
             ELLGGDDDGD CDDDTNNDYY YVYYDPFFDD DTTYYYPPSS SVQQPDLNRD DTTSPPPPLP
             ELLGGDDDGD CDDDTNNDYY YVYYDPFFDD DTTYYYPPSS SVQQPDLNRD DTTSPPPPLP

             AAFFAAPDDA AAAAAV-EEE EEEEVVGG-- -------VLL LLLMMAAGAA IAAAALLSAA
             PTWWSSPLLS SSPPPLEEEP PEEETTDDDD DDFDDLPLAA IIIYYAA-RR IDDNNEESKK
             PTWWSSPLLS SSPPPLEEEP PEEETTDDDD DDFDDLPLAA IIIYYAA-RR IDDNNEESKK

             AQQQAAAAAA ALAVVVSSSA SGGIIGAAFF TTTTTALSSS RRLPSVVPPT THHFHHHYEA
             KTTTLLLESS SVELLLGGGP T----EAAFF TTEEEALSSS NRLPSAASSS SEELKKTNDA
             KTTTLLLESS SVELLLGGGP T----EAAFF TTEEEALSSS NRLPSAASSS SEELKKTNDA

             ACYLAAAHFA AANNNNQIIL EECDHVIDFM QLLQWPPIIA LLAAARPPPP PGGGPPPFFL
             ACYSAAAHLA AANNNNQIIL EESNHIVDFV QIIQWPPLLA LLAAARTTTT TSSGKPPTTI
             ACYSAAAHLA AANNNNQIIL EESNHIVDFV QIIQWPPLLA LLAAARTTTT TSSGKPPTTI

             ITGSSPGDDE -LRVVVLLRD DLAAASSRFF FFRGGVVAAA ANSSSERPLQ IIPEANVLLQ
             VSGSSLEPPE PLITTTNNRD DFAAAVVDFF FFIPPIILLT T-PPPLNGFR VVPEANMLLQ
             VSGSSLEPPE PLITTTNNRD DFAAAVVDFF FFIPPIILLT T-PPPLNGFR VVPEANMLLQ

             QQLLHRRRLL LGDDPP--II IAALDDCVAA SSSPPKIFVI IIIEEQEDDD DHNNKFLDDD
             QQLLYKKKLL L-DDPPTIVV VTTLRRLAKK SSSPPRVVLG GGGEEYESSS SLNNRFANNN
             QQLLYKKKLL L-DDPPTIVV VTTLRRLAKK SSSPPRVVLG GGGEEYESSS SLNNRFANNN

             DTTYYSAVVF DSSLLDDDDA ASSAGGGGGG NMAAE--LRR EEEIICIIGE AA-REEEEHH
             NKKFYSAVVF ESSLLEEEEP NLLGDSSSEE RRVVERRFRR RRRIISLLPE TTIHEEEEMM
             NKKFYSAVVF ESSLLEEEEP NLLGDSSSEE RRVVERRFRR RRRIISLLPE TTIHEEEEMM

             ELSRRRDDDD LLTRASAGNL RRMMLVGLSG EG-SVEEDDD GGCLLTTWHH HRPLFFSAWW
             EKEQRRVVVV MMENAESSYV SKIILLWNNY SNLIVESPPP GGFIISSWNN NLPLLLTSWW
             EKEQRRVVVV MMENAESSYV SKIILLWNNY SNLIVESPPP GGFIISSWNN NLPLLLTSWW

             EAAAGDDDGG GDDDNNNNNS NSNVSGGSDN SNGSGAGLL
             R--------- ---------- ---------- ---------
             R--------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -----MMFFF QQ--------
t5g66770.1   AAMMTDDDDS SGGGNNLLII AAQQVKQQEE QQQQQQQQQQ HHHHDHHIII NNLLSSLLNN
t5g66770.2   AAMMTDDDDS SGGGNNLLII AAQQVKQQEE QQQQQQQQQQ HHHHDHHIII NNLLSSLLNN

             -WMMDDAAAL DAAAFFFLPA V--------- ---------A DDDGGV---- ----------
             PW--NNSLLL SGGGAAAFPQ VGSNPPGGFF PPNNNLHHHA TTTGGGRRSS DGGGGETLII
             PW--NNSLLL SGGGAAAFPQ VGSNPPGGFF PPNNNLHHHA TTTGGGRRSS DGGGGETLII

             ---------- --DPPPPP-- ----A----- ---------- ---------V ALLPEFFFAA
             SSGAAGGGGP CTDNNNPPYY VVIYPPFYYP SSSLLLVVVV VQPPDLRRRI SPPLPPPPPP
             SSGAAGGGGP CTDNNNPPYY VVIYPPFYYP SSSLLLVVVV VQPPDLRRRI SPPLPPPPPP

             APPPPPPAPA AAAAAAAAL- REEEAAII-- ----LLSCAA IEAGGHHHHA ALLSSSLSHA
             PPPPPPPSPS SPPPPPPPTE KEEENNSSDD DPPPAIDCAR ISDSSPPPPN NEESSSLIRE
             PPPPPPPSPS SPPPPPPPTE KEEENNSSDD DPPPAIDCAR ISDSSPPPPN NEESSSLIRE

             AAAASAASSG IGGGRVVVVV HHFTTLLSPS SPVVVAAPPT TTAAHFFL-- -----YYHHH
             SSSEGDPTT- -EEERVFFFF YYFTTLLSPS SPAAATTSSS SSSSELLILL LSSSSYYKTT
             SSSEGDPTT- -EEERVFFFF YYFTTLLSPS SPAAATTSSS SSSSELLILL LSSSSYYKTT

             YEEPYLKKKA ATQEFFCCCD HVIDDFSSLL LMMMQGPPPA LQLLAAALPG PP--RTTIGG
             NDDPYSKKKA ATQETTSSSN KIVDDFGGII IVVVQGPPPA LQLLAAATTG KPQQRSSIPP
             NDDPYSKKKA ATQETTSSSN KIVDDFGGII IVVVQGPPPA LQLLAAATTG KPQQRSSIPP

             GSPTTGRR-- LLVVVVGGLR RDAVVVVRRF FRRGGGVAAA NLLVVRRPPP PWMMQPANSL
             PSLGGESSPS LLTTTTGGNR RDALLLLNNF FIIPPPILTT -IILLNNGGG GSSSRPVNFL
             PSLGGESSPS LLTTTTGGNR RDALLLLNNF FIIPPPILTT -IILLNNGGG GSSSRPVNFL

             LQQQLHRRRR RRLLGDPAQA IAAVLCCVVS SVRKKIITVI IIEEQADHHH HTFLLDDFTE
             LQQQLYKKKK KKLL----ET VTTALLLAAS SLNRRVVTLG GGEEYVSLLL LVFAANNVKN
             LQQQLYKKKK KKLL----ET VTTALLLAAS SLNRRVVTLG GGEEYVSLLL LVFAANNVKN

             FFYSAVDDDS AASSAAAGME -AQQRCCCIV GERERRRHPP RWDDDRTAGL LSSAVPLLLG
             QQFSAVEEES PNLLGEEERE REGGRSSSLI PEHERRRMEE QWVVVLEAGF FEESVKLLLS
             QQFSAVEEES PNLLGEEERE REGGRSSSLI PEHERRRMEE QWVVVLEAGF FEESVKLLLS

             SSSAALLLLR RRRRQARMVL LLGG----HS SEEECLTLGH HGRRFSAWEA ADDDDGNSNS
             NNNAAVVVVS SSSSQAKILN NNYYLLYYSI IEESFISLAN NDLLLTLWR- ----------
             NNNAAVVVVS SSSSQAKILN NNYYLLYYSI IEESFISLAN NDLLLTLWR- ----------

             NVSSSGGSSS SDSSSNNGSS SGGGSADDGG SSSSVCLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- MMMDDFFFQQ ----------
t5g66770.1   MAAYYCTDDS GGNMAIQIKK QQQQQEEQQQ QQQQQHHHQD HHHQQFFINN LLSLLNNPPP
t5g66770.2   MAAYYCTDDS GGNMAIQIKK KKQQQEEQQQ QQQQQHHHQD HHHQQFFINN LLSLLNNPPP

             WPMMASSSGG FLLPPPAAAV ---------- ---------- APPPDDDVVV GY--------
             WP--SGGFGS AFFPDPFFQV TGGGGGNFPF FPPPNDHHHH ATTTTTTGGG GFRDDFGGGT
             WP--SGGFGS AFFPPPFFQV TGGGGGNFPF FPPPNDHHHH ATTTTTTGGG GFRDDFGGGT

             ---------- ---------- ---------D PPPPA---G- ---------- ----------
             TTTGFEEEWE ELIISSGGDA ADDGGPDDTD NNNPDYIIGP PFDTPSRRQP SSSSSDLRRV
             TTTGFEEEWE ELIISSGGDA ADDGGPDDTD NNNPDYIIGP PFDTPSRRQP SSSSSDLRRV

             DDAALLPPEF AFPCAPAAV- -REEEVVVAG RR-------- -----LHHLL LLMSCAAGGI
             DDSSPPLLPP TWPSSPSPLH HTDPETTTND EEDDDDDFDL LEPPPLKKAA IIYDCAA--I
             DDSSPPLLPP TWPSSPSPLH HTDPETTTND EEDDDDDFDL LEPPPLKKAA IIYDCAA--I

             DHHAAQAALA AVAAAASGRR VAAVVFFTLL RRRRRPPPP- PPVVVVAPPP PPPPTAAEFL
             DPPNKTEEVS ELDPPPT-RR VAAFFFFTLL NNNNNPPPPN PPAAAATSSS SSSSSSSTLI
             DPPNKTEEVS ELDPPPT-RR VAAFFFFTLL NNNNNPPPPN PPAAAATSSS SSSSSSSTLI

             ---YHFFEAC CYLAAFTTNQ AILEAFFFGD VHHVIDSSLL LMMQWPLIAA LGPPF-TGGI
             LLLYKLLDAC CYSAALTTNQ AILEATTTKN IHHIVDGGII IVVQWPLLAA TGKPTQSGGI
             LLLYKLLDAC CYSAALTTNQ AILEATTTKN IHHIVDGGII IVVQWPLLAA TGKPTQSGGI

             GGPPPPTEEE ELRDDDGRRD DRRSSSVVRR SVVASLLDRP WMLLQIVAFF FNNNSVVVVL
             PPAPLLGEEE ELIAAAGRRD DKKVVVLLDN DIILPIIHNG SSFFRVLAVV VNNNFMMMML
             PPAPLLGEEE ELIAAAGRRD DKKVVVLLDN DIILPIIHNG SSFFRVLAVV VNNNFMMMML

             HRLLLLGGDP AADDAAPP-- IIDAVLDVVS SRPPKIIIFF TTQQEAKTFF FLDRTTTTTE
             YKLLLL---- --DDTTPPTI VVDTALRAAS SNPPRVVVVV TTYYEVRVFF FANRKKKKKN
             YKLLLL---- --DDTTPPTI VVDTALRAAS SNPPRVVVVV TTYYEVRVFF FANRKKKKKN

             AAAYYSSSFF DSAASSSAAA SSGAMMMAYY QRICCIVVVC GAARRREHPP LLWRRDLRAA
             AAAFFSSSFF ESPNLLLGGG RREVRRRVLL GRISSLIIIG PTGHHREMEE KKWRRVMNAA
             AAAFFSSSFF ESPNLLLGGG RREVRRRVLL GRISSLIIIG PTGHHREMEE KKWRRVMNAA

             LLAVPLSSAA LRRQQAAMML LGLFSSSGGG G--VVEAACC CLTLWWRRPL FFAWWAAAGD
             FFSVKLNNAA VSSQQAAIIL LWNYNNNNNN NYYVVSKKFF FISLWWLLPL LLLWW-----
             FFSVKLNNAA VSSQQAAIIL LWNYNNNNNN NYYVVSKKFF FISLWWLLPL LLLWW-----

             DGGGNNNSSV VVGSGGDDDS SNGSNKKGAR RRDSSSVVC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- MTTFFFFQQ- -----MAASS
t5g66770.1   AAAAMCTTSL LLMAQQQQQQ QVIQQQQQQQ QEQQQHHQDD HIIFFFINNL LLLNP-SSGG
t5g66770.2   AAAAMCTTSL LLMAQQQQQQ QVIKQQQQQQ QEQQQHHQDD HIIFFFINNL LLLNP-SSGG

             GDAFFLLPPP AAAV------ ---------- --AAAPDVGY YY-------- ----------
             GSGAAFFDDD FQQVSNNDDP PPGGGPPFLL DHAAATTGGF FFLSDFFGGG GFEWEETTGD
             GSGAAFFPPP FQQVSNNDDP PPGGGPPFLL DHAAATTGGF FFLSDFFGGG GFEWEETTGD

             ---------- --------DP PPAA------ AAAD------ ---------- VVVDAAAAFP
             SVVAAAADDD DGPPCDTTDN NPDDYVVVIY PPPDPFTTYY SRRRSSSSDR IIIDTTSTWP
             SVVAAAADDD DGPPCDTTDN NPDDYVVVIY PPPDPFTTYY SRRRSSSSDR IIIDTTSTWP

             PPCCCCADAA V-----MRRR REEEEEEEVR ---------- ----VHMMSC CAAAIEEEAH
             PPSSSSSLSI LHHHEEPTKK KEDDPPEETE DDDDFFDDLL LPPPLKYYDC CAARISSSDP
             PPSSSSSLSI LHHHEEPTKK KEDDPPEETE DDDDFFDDLL LPPPLKYYDC CAARISSSDP

             HALASSSAQL SAALLAAASS AAAAGRVVHT TALLSSRLF- SSPAAPTDHL -----YHHHF
             PNEASSSKTL IESVVSSEGG DDDPERVVYT TALLSSNLSN SSPTTSSSEI SSSSSYKTTL
             PNEASSSKTL IESVVSSEGG DDDPERVVYT TALLSSNLSN SSPTTSSSEI SSSSSYKTTL

             YEEAACCPYL LKFANLEAFF HGGGGDHHVI DFLMMMMGLQ QWPAIQQAAL LAPPF----R
             NDDAACCPYS SKLANLEATT EKKKKNKHIV DFIVVVVGIQ QWPALQQAAL LAPPTQQQQR
             NDDAACCPYS SKLANLEATT EKKKKNKHIV DFIVVVVGIQ QWPALQQAAL LAPPTQQQQR

             RIITTGPSTG GGGR-LDVGL AALLARSRRV SRGGVAPWLL LLQQIIAAPG AAFSSLQLRR
             RVVSSPPSGE EEESPLATGN RRFFAKVDDL DIPPILGSFF FFRRVVDDPD AAVFFLQLKK
             RVVSSPPSGE EEESPLATGN RRFFAKVDDL DIPPILGSFF FFRRVVDDPD AAVFFLQLKK

             LLDPDDQA-I AADCCCVVVA SSVPKTIEEE EEAHHNNTTT GFLDDFEEAL LYYAFFSLDD
             LL--DDETIV TTRLLLAAAK SSLPRTGEEE EEVLLNNVVV GFANNVNNAL LFYAFFSLEE
             LL--DDETIV TTRLLLAAAK SSLPRTGEEE EEVLLNNVVV GFANNVNNAL LFYAFFSLEE

             ASSAAGGGGG NAAMAEYYYR RREIICIVEA AAA--RRRRR ERHHEELRRW DRRRLTRALA
             PLLGGDSSEE RVVRVELLLR RRRIISLIET TTGIIHHRRR ERMMEEKQQW VLLLMENAFS
             PLLGGDSSEE RVVRVELLLR RRRIISLIET TTGIIHHRRR ERMMEEKQQW VLLLMENAFS

             VLLLLGGALQ QAARRMMLVV GGGLSGG--S VEAACCLLLL LGHHGGGLFS SAEAAGDDDG
             VLLLLSSAVQ QAAKKIILLL WWWNNYYYYI VSKKFFLLLL LANNDDDLLT TLR-------
             VLLLLSSAVQ QAAKKIILLL WWWNNYYYYI VSKKFFLLLL LANNDDDLLT TLR-------

             GGNNNNSSSS NNNNSVGSSN NNNGSSSSSS NGSSGDSVV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ----MDTPQ- ----WWMMDD
t5g66770.1   MMMYMMMMTT TSMMMAIIAQ QQQIKKQQQQ QEEQQQQQQQ QHQQHQIGNP SLNPWW--NN
t5g66770.2   MMMYMMMMTT TSMMMAIIAQ QQQIKKKQQQ QEEQQQQQQQ QHQQHQIGNP SLNPWW--NN

             PASSSDAGPP PVV------- ---------- --AAAPDDV- ---------- ----------
             TSGFFSGSPD PVVTGGDDSS NNDDDPFFPN LDAAATTTGR RLSFGGGTGG GGESSSDDEE
             TSGFFSGSPP PVVTGGDDSS NNDDDPFFPN LDAAATTTGR RLSFGGGTGG GGESSSDDEE

             ---------- ---------- ---YDDPPA- ---A------ ---------- ----------
             WMMELLISSG DSVAGPPPDD DWWHDDNPDY VVYPPPPFFD DTYPRVVPPS SDDLLLNNRV
             WMMELLISSG DSVAGPPPDD DWWHDDNPDY VVYPPPPFFD DTYPRVVPPS SDDLLLNNRV

             VDDAAALLLP FAAAAAPPPC PDAA----MR EEEEEAGGIR -----LVLCC AGGGIEAAAA
             IDDTSSPPPL PPPPLLPPPS PLPPHHHEPT EDPPENDDSE DDDPPLLACC A---ISDDDD
             IDDTSSPPPL PPPPLLPPPS PLPPHHHEPT EDPPENDDSE DDDPPLLACC A---ISDDDD

             DHHALLASAQ ADDSAAAAAV AAGIIGRVVV ATTTASSLFF FF--VVVPTT DAAAHHHFFF
             DPPNEEASKT LQQIESSSSL DP---ERVVV ATTTASSLSS SSNNAAASSS SSSSEEELLL
             DPPNEEASKT LQQIESSSSL DP---ERVVV ATTTASSLSS SSNNAAASSS SSSSEEELLL

             LL-HHHAACY KFTANNAIEE EAAHGGCCHH HVIIDFMMQG WPAIQQALAA LPPFLRTGII
             IILKKTAACY KFTANNAIEE EAAEKKSSKH HIVVDFVVQG WPALQQALAA TPPTIRSGII
             IILKKTAACY KFTANNAIEE EAAEKKSSKH HIVVDFVVQG WPALQQALAA TPPTIRSGII

             GPPSSPGRRR E-LRDDDDVV VGLRLLAAAL AARRVVVRRF FFGSSSLDDP PWMMLQQAGG
             PAASSLESSS EPLIAAAATT TGNRLLRRRF AAKKLLLNNF FFPPPPIHHG GSSSFRRDDD
             PAASSLESSS EPLIAAAATT TGNRLLRRRF AAKKLLLNNF FFPPPPIHHG GSSSFRRDDD

             EANSQRLLLG DPAADDDAAP P--ADCVVVA SSKFTIIEEQ QDDDNNKTTL DFTTEAAALF
             EVNFQKLLL- ----DDDTTP PTITRLAAAK SSRVTGGEEY YSSSNNRVVA NVKKNAAALQ
             EVNFQKLLL- ----DDDTTP PTITRLAAAK SSRVTGGEEY YSSSNNRVVA NVKKNAAALQ

             YSAVLDAASA AAMMMAEEEA AYRRRIIDII VVVGEGA-RR EEHHHPRRWL AAAAGGLLLA
             FSAVLEPPRE EVRRRVEEEE ELRRRIIGLL IIIPEKGIHR EEMMMEQQWM AAAAGGFFFS
             FSAVLEPPRE EVRRRVEEEE ELRRRIIGLL IIIPEKGIHR EEMMMEQQWM AAAAGGFFFS

             APLLLSAARQ AAVGLLFFSG GEGG--HSSV EECCCTLWPF SSSAWWAAAA GDGGNNNNNN
             SKLLLNAASQ AALWNNYYNY YSNNLYSIIV SSFFFSLWPL TTSSWW---- ----------
             SKLLLNAASQ AALWNNYYNY YSNNLYSIIV SSFFFSLWPL TTSSWW---- ----------

             SSSSNSNVVV VSSGSDSSSN NNNGGSNNGK SSSGAGGSC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---FPF---- PPSDAAGGFP PAAV------
t5g66770.1   MMMAAYMCDD MMAAAAAQQQ IQQQEEEQQQ HQQFGILLSN PTFSGGSSAP DQQVTGGGDS
t5g66770.2   MMMAAYMCDD MMAAAAAQQQ IQQQEEEQQQ HQQFGILLSN PTFSGGSSAP PQQVTGGGDS

             ---------A DDDGVG---- ---------- ---------- ---------- ---YDDA--D
             NNGPPFHHHA TTTGGGRRDF GGGTTTGGGG GSSSDWWWSV VGGPPDDCDD WWWHDDDYYD
             NNGPPFHHHA TTTGGGRRDF GGGTTTGGGG GSSSDWWWSV VGGPPDDCDD WWWHDDDYYD

             ---------- -------DDA AEFFAAAAAF PPPCAPPDAA AAVL---MME EEVVVAGG--
             DPSSRRLSVP PPDLLNNDDS SPPPPPTTLW PPPSSPPLSI PPLTHHEPPE EPTTTNDDDD
             DPSSRRLSVP PPDLLNNDDS SPPPPPTTLW PPPSSPPLSI PPLTHHEPPE EPTTTNDDDD

             --------LL HHLLMCGAII EEGHALAQLA DDHAALAAVA SSSRVAVHTT TTTTTALSRR
             DFEEPPPPLL KKAAYC-RII SSSPNEATLL QQRSSVSELP TTTRVAFYTT EEEEEALSRR
             DFEEPPPPLL KKAAYC-RII SSSPNEATLL QQRSSVSELP TTTRVAFYTT EEEEEALSRR

             FPSSSSSPVA AAPPTTTTDE HHAAFFL--H HHHHYEEEAC PLLLKKKFFF AAHHFTAQAI
             SPSSSSSPAT TTSSSSSSST EEDDLLILLK KKTTNDDDAC PSSSKKKFFF AAHHLTAQAI
             SPSSSSSPAT TTSSSSSSST EEDDLLILLK KKTTNDDDAC PSSSKKKFFF AAHHLTAQAI

             LAFDHHVVVH HHIILLMMWL IIQALLRPGG GGPF-LLGGI IGPPSPTTTG RRDEE----R
             LATNKKIIIH HHVVIIVVWL LLQATTRTSG GGPTQIIGGI IPAPSLGGGE SSPEEPPSSI
             LATNKKIIIH HHVVIIVVWL LLQATTRTSG GGPTQIIGGI IPAPSLGGGE SSPEEPPSSI

             RRVVLLLALL ASSRRSFRGG VVAANNSSSL DDRRPWLQQQ AAPGGAAFNL LLQHRLLLLG
             IITTNNLRFF AVVNNDFIPP IILL--PPPI HHNNGSFRRR DDPDDVAVNL LLQYKLLLL-
             IITTNNLRFF AVVNNDFIPP IILL--PPPI HHNNGSFRRR DDPDDVAVNL LLQYKLLLL-

             DAAAAPP--- IDVVDDDRRT VQEEAHNTGG FFDRRFFTTT AAAYAAFFFD DLLDAAAAAA
             -----PPTTI VDAARRRNNT LYEEVLNVGG FFNRRVVKKK AAAYAAFFFE ELLEPNNNGG
             -----PPTTI VDAARRRNNT LYEEVLNVGG FFNRRVVKKK AAAYAAFFFE ELLEPNNNGG

             SSGGGNAAEE -AYYLQREIC CDDIVCCGGA A-RRRRREPR DRLLLLLRAG GLLSSAVVSN
             RRDSSRVVEE RELLFGRRIS SGGLIGGPKT GIHHRRREEQ VLMMMMMNAG GFFEESVVNY
             RRDSSRVVEE RELLFGRRIS SGGLIGGPKT GIHHRRREEQ VLMMMMMNAG GFFEESVVNY

             NARRMVGLFS GEG-HHSVVV VEEEACLLLT GGRRPPSASS AWWAAAAAGG GGGGNNNSSS
             YASKILWNYN YSNLSSIVVV VEESKFIIIS DDLLPPTLSS SWW------- ----------
             YASKILWNYN YSNLSSIVVV VEESKFIIIS DDLLPPTLSS SWW------- ----------

             NSSSSNNNVV SSSSDNNNNS SSSSSKSSSS GSSSVCCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -----DFFFF P-----WMDD AAADAGGGFF
t5g66770.1   AYYYCCCTSM AAAAAIAAQQ VKQQEEQQQQ QHHQQQFFFF GPSLNNW-NN SLLSGSSSAA
t5g66770.2   AYYYCCCTSM AAAAAIAAQQ VKQQEEQQQQ QHHQQQFFFF GPSLNNW-NN SLLSGSSSAA

             LLPPPPAAA- ---------- ---------A PDGGY----- ---------- ----------
             FFPPDPFFQG GGGGDDDSND DPPFFFPLHA TTGGFRLFFT GGGGGGGGEE ESDWMMLISG
             FFPPPPFFQG GGGGDDDSND DPPFFFPLHA TTGGFRLFFT GGGGGGGGEE ESDWMMLISG

             ---------- ----YAAA-- ----DDD--- ---------- ---------- -VDAAAPEFF
             VVVAADDGGD CCTWHDDDYY VVVIDDDFFT TYSRRRVQQP DDLNRRRRRV VIDSSSLPPP
             VVVAADDGGD CCTWHDDDYY VVVIDDDFFT TYSRRRVQQP DDLNRRRRRV VIDSSSLPPP

             FFFAAFFPAA PAAAVV-AAR RRRREEEVGI I--------- LHLLLMCAES AAAAALLLLA
             PPPLLWWPSS PIPPLLESST KKKKEDETDS SDDFDLLEPP LKAAAYCASS KKKKKLLLLL
             PPPLLWWPSS PIPPLLESST KKKKEDETDS SDDFDLLEPP LKAAAYCASS KKKKKLLLLL

             ASAASAAGGG GRVVVVVHTL LSRRRRRRLL FPSVAAPPPT TDAEHAFFF- -YHFFEACPP
             LIESGDP--E ERVVFFFYTL LSNNRRRRLL SPSATTSSSS SSSTEDLLLS SYKLLDACPP
             LIESGDP--E ERVVFFFYTL LSNNRRRRLL SPSATTSSSS SSSTEDLLLS SYKLLDACPP

             PYYLKFFTAQ AILLAAFGDD VVIIFLLMMG WPPALLLIIQ AAAAPGGPPP F-IIIITIGG
             PYYSKFLTAQ AILLAATKNN IIVVFIIVVG WPPALLLLLQ AAAATSGPPP TQVVVVSIPP
             PYYSKFLTAQ AILLAATKNN IIVVFIIVVG WPPALLLLLQ AAAATSGPPP TQVVVVSIPP

             PGGGRDE-DG GLLAAADLAR RVSFRGGVAA ANNSLVVVRR RPPWMMLQII PGGGEAAFSV
             AEEESPEPAG GNLRRRDFAD DLDFIPPILL T--PILLLNN NGGSSSFRVV PDDDEVVVFM
             AEEESPEPAG GNLRRRDFAD DLDFIPPILL T--PILLLNN NGGSSSFRVV PDDDEVVVFM

             LQLLHRLGDP PDQQQQQPP- AVVLCCVVAS VKIFTTVVII QEDDDHHNNN KKTTGGFFFL
             LQLLYKL--- -DEEEEEPPT TAALLLAAKS LRVVTTLLGG YESSSLLNNN RRVVGGFFFA
             LQLLYKL--- -DEEEEEPPT TAALLLAAKS LRVVTTLLGG YESSSLLNNN RRVVGGFFFA

             DDDRFFTLYA AAVFSLLASS SSGGAGAMAE E--ALLLQRE CIIVVVGGEG ARRRHEPPPP
             NNNRVVKLYA AAVFSLLPLL LRDDEEVRVE ERREFFFGRR SLLIIIPPEK THHRMEEEEE
             NNNRVVKLYA AAVFSLLPLL LRDDEEVRVE ERREFFFGRR SLLIIIPPEK THHRMEEEEE

             SWDRLTAALS SAAGSSNRQQ QRLVVFSEEG G-SSVEEEEA DDTTTLGGGS SAAAAAWWAA
             EWVLMEAAFE ESSSNNYSQQ QKLLLYNSSN NLIIVEESSK PPSSSLAADT TLLLLSWW--
             EWVLMEAAFE ESSSNNYSQQ QKLLLYNSSN NLIIVEESSK PPSSSLAADT TLLLLSWW--

             GDGGGNNNNS SNSSGDDDSN SSSSSSSSGA ARDGSSSVV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---MMMDDTF PQ--WDDPAS SSSSGLLDDD
t5g66770.1   AAYYMMTNNL LIIAAQQQQQ QQQQQQQQQH QQDHHHQQIF GNNPWNNTSG GGFFGLLSSS
t5g66770.2   AAYYMMTNNL LIIAAQQQQQ QQQQQQQQQH QQDHHHQQIF GNNPWNNTSG GGFFGLLSSS

             DAAGGFPPPP AAAVV----- ---------- ------AADD DGVVGGY--- ----------
             SGGSSAPPDP FQQVVGDDDS SDPPGGFPPL DDHHHHAATT TGGGGGFRRR RRLLSFGGGE
             SGGSSAPPPP FQQVVGDDDS SDPPGGFPPL DDHHHHAATT TGGGGGFRRR RRLLSFGGGE

             ---------- ---------- ---DDDA--- -DDD------ ---------- ---VDAAAAA
             ESDWWWMMME TLLIISSGDD AGWDDDDVVV YDDDPPFDYR LLSVQQPPSL LLVIDSSSSS
             ESDWWWMMME TLLIISSGDD AGWDDDDVVV YDDDPPFDYR LLSVQQPPSL LLVIDSSSSS

             LLPPAAAFPA AAPPPDAAAV VL---MRREE EEEEVI---- ------LVLM MGEADDHHAL
             PPLLLLLWPS SSPPPLSIPL LTHEEPTKEE DDPETSDDDD DFDLEPLLIY Y-SDDDPPAL
             PPLLLLLWPS SSPPPLSIPL LTHEEPTKEE DDPETSDDDD DFDLEPLLIY Y-SDDDPPAL

             SHLLAAVAAA AASSGGGGVV VAAVVHTTTL SRRRRLLFSV AAPPTDDDDD AAALL--HHE
             IRVVSSLDDD PPTT--EEVV VAAFFYTEEL SNNRRLLSSA TTSSSSSSSS DDDIISSTTD
             IRVVSSLDDD PPTT--EEVV VAAFFYTEEL SNNRRLLSSA TTSSSSSSSS DDDIISSTTD

             EACPPKFHHH TTAQQQAILL AAFFFFFGDD DHHVVISSLM MMGLWWPPPI IQLLLARPGP
             DACPPKFHHH TTAQQQAILL AATTTTTKNN NKHIIVGGIV VVGIWWPPPL LQLLLARTGP
             DACPPKFHHH TTAQQQAILL AATTTTTKNN NKHIIVGGIV VVGIWWPPPL LQLLLARTGP

             FF-LTIGSTG GGRELRRRVG LLRAADLLLA RRRVRFSGVV VAAANNNSLD ERPWMQQQQI
             TTQISIPSGE EESELIIITG NNRRRDFFFA KDDLNFDPII ILLT---PIH LNGSSRRRRV
             TTQISIPSGE EESELIIITG NNRRRDFFFA KDDLNFDPII ILLT---PIH LNGSSRRRRV

             AAPPAAFSVL QLLRLGGGDP QA-IAAVLDA ASRFFTTVIE AHKTGGLLLF TTFFFYYYYY
             DDPPVAVFML QLLKL----- ETTVTTALRK KSNVVTTLGE VLRVGGAAAV KKQQQFFFYY
             DDPPVAVFML QLLKL----- ETTVTTALRK KSNVVTTLGE VLRVGGAAAV KKQQQFFFYY

             SSSAVFFDSS LLLDAAAGGG NAAAAA---A YLLLQQEECD GGGRRREHPP SSRDDDDRLT
             SSSAVFFESS LLLEPPGDSS RVVVVVRRRE LFFFGGRRSG PKKHRREMEE EEQVVVVLME
             SSSAVFFESS LLLEPPGDSS RVVVVVRRRE LFFFGGRRSG PKKHRREMEE EEQVVVVLME

             AGGSVPPPLL LSSSNNNALL RQAARRMLVG LLGGG---HV EAADGCCLLL WGPFFSAWWE
             AGGEVKKKLL LNNNYYYAVV SQAAKKILLW NNYNNLYYSV EKKPGFFILL WDPLLSSWWR
             AGGEVKKKLL LNNNYYYAVV SQAAKKILLW NNYNNLYYSV EKKPGFFILL WDPLLSSWWR

             GDGDNNNSSN NNNVSSSDDD SSSNNNSGSS NKSRDSVCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --FFP----- -WWPMDPAAA ASSLDDAAFL
t5g66770.1   MAYTGNLMMM AIIAQVIQQQ KEQQQQHHHQ DDFFGPLSSL NWWP-NTSSS LGFLSSGGAF
t5g66770.2   MAYTGNLMMM AIIAQVIKKK KEQQQQHHHQ DDFFGPLSSL NWWP-NTSSS LGFLSSGGAF

             LPAV------ ---------- -------AGG GVGG------ ---------- ----------
             FPFVTGGGGD SDPGFFFFFP PPDDDHHAGG GGGGRSDDFG GGGGGSEMME ELLSGGGGDV
             FPFVTGGGGD SDPGFFFFFP PPDDDHHAGG GGGGRSDDFG GGGGGSEMME ELLSGGGGDV

             ---DP----- GAAD------ ---------- ---VDDAAAP EEAAAAFPPC CAPPAAAAAV
             DDTDPYYVIY GPPDPPFFDD TYYRVQPLLN RRVIDDSSSL PPPPTLWPPS SSPPSSIIPL
             DDTDPYYVIY GPPDPPFFDD TYYRVQPLLN RRVIDDSSSL PPPPTLWPPS SSPPSSIIPL

             VV--AMMMRR RRRRVVVG-- -----LLVLS CCGEEGHLSS SSQQLADDSS HHHLASGIGG
             LLEESPPPTT KKKKTTTDDD DEPPPLLLID CC-SSSPESS SSTTLLQQII RRRVET--EE
             LLEESPPPTT KKKKTTTDDD DEPPPLLLID CC-SSSPESS SSTTLLQQII RRRVET--EE

             GRVVVATTTT TTAALRLFFP PP----PAAP PPDAAEA--Y YYEECYLHFF FTAANNQAII
             ERVVVATTTE EEAALRLSSP PPNNNNPTTS SSSSSTDLLN NNDDCYSHLL LTAANNQAII
             ERVVVATTTE EEAALRLSSP PPNNNNPTTS SSSSSTDLLN NNDDCYSHLL LTAANNQAII

             LLLEAAFFHH GGVHVDDQGG WAIIIQLLLL APGGGGPPPL RRRITTTGPT TGRRE-RRRV
             LLLEAATTEE KKIHIDDQGG WALLLQLLLL ATSSSSKKPI RRRVSSSGLG GESSEPIIIT
             LLLEAATTEE KKIHIDDQGG WALLLQLLLL ATSSSSKKPI RRRVSSSGLG GESSEPIIIT

             GADLAARRSV RRVFFFFRRV VASLEVVRWM LLLQIPPPPE AVAANNSVLQ RLLLLPPQAA
             GRDFAAKKVL DDLFFFFIII ITPILLLNSS FFFRVPPPPE VLAANNFMLQ KLLLL--ETT
             GRDFAAKKVL DDLFFFFIII ITPILLLNSS FFFRVPPPPE VLAANNFMLQ KLLLL--ETT

             PIDDAVDCVV SSVRPPIIFV VIIIEDHHNK KTTFFRRTEA LYYSSSASGG GAAAAAMMMA
             PVDDTARLAA SSLNPPVVVL LGGGESLLNR RVVFFRRKNA LFYSLLGRDD SEEEVVRRRV
             PVDDTARLAA SSLNPPVVVL LGGGESLLNR RVVFFRRKNA LFYSLLGRDD SEEEVVRRRV

             AEE-AAAQQI CVVCGGGGEA ARRRRHEEPL LSRRWWWRRR DDRRRRRRRA GGGLLAPLGG
             VEEREEEGGI SIIGPPPPET GHHRRMEEEK KEQQWWWRRR VVLLLNNNNA GGGFFSKLSS
             VEEREEEGGI SIIGPPPPET GHHRRMEEEK KEQQWWWRRR VVLLLNNNNA GGGFFSKLSS

             SNAALRRARL VVVLFFSSE- HSEAADDDCC TTGGWGRRPP PPLLSAASSS EAAAGGGDNN
             NYAAVSSAKL LLLNYYNNSL SISKKPPPFF SSAAWDLLPP PPLLTLLSSS R---------
             NYAAVSSAKL LLLNYYNNSL SISKKPPPFF SSAAWDLLPP PPLLTLLSSS R---------

             NNNNNSSSGS SSSSGGDSSN NNSSSKSSGG GARRDSSCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -------MTT TPFQQ----P PPDDDPSGLA
t5g66770.1   MMYMMMMCSS SMAAIAAQVI KKEQQQQQQH HHQQQDDHII IGINNPPLPP PPNNNTGGLG
t5g66770.2   MMYMMMMCSS SMAAIAAQVI KKEQQQQQQH HHQQQDDHII IGINNPPLPP PPNNNTGGLG

             AGPAAAAA-- ---------- ---------- AADDV----- ---------- ----------
             GSPFFQQQGG GSNNNNPGGG PFFFPPPHHH AATTGRRLLF GGGGGGEESW EEIISGDDSD
             GSPFFQQQGG GSNNNNPGGG PFFFPPPHHH AATTGRRLLF GGGGGGEESW EEIISGDDSD

             ---------- YDDPPP---- -GD------- ---------- -----DAAAL LPEEFFFAAA
             GPDDCDDDDD HDDNPPVIIY YGDPPTTYPP SSLSSQPSSS DLLRVDSSSP PLPPPPPTTL
             GPDDCDDDDD HDDNPPVIIY YGDPPTTYPP SSLSSQPSSS DLLRVDSSSP PLPPPPPTTL

             FPPCAAAAAA VVL--AAAMM MMRREEEEAI R--------- -LLVHHLLLL MAAEAAGALA
             WPPSSSSIIP LLTEESSSPP PPKKEDPENS EDDDDDDDFD LLLLKKAAII YRRSDDSNEA
             WPPSSSSIIP LLTEESSSPP PPKKEDPENS EDDDDDDDFD LLLLKKAAII YRRSDDSNEA

             AAQLDDSSAA LLLLAAAAVV VSASGRVAVH LLLSRLFPPS SAAPPTTTDD AHHHHF-YHH
             AKTLQQIIES VVVVSSSELL LGDT-RVAFY LLLSNLSPPS STTSSSSSSS SEEEELLYKK
             AKTLQQIIES VVVVSSSELL LGDT-RVAFY LLLSNLSPPS STTSSSSSSS SEEEELLYKK

             YEEEAPLKFA FAANQEAAGG DHHHVVIIDF SLLMQLQLLL LLARRPGGGG F-LIIIPPSP
             NDDDAPSKFA LAANQEAAKK NKKHIIVVDF GIIVQIQLLL LLARRTSSGG TQIVIIAPSL
             NDDDAPSKFA LAANQEAAKK NKKHIIVVDF GIIVQIQLLL LLARRTSSGG TQIVIIAPSL

             TTGGRE-LRD DVVVGGGRLL ADDDAARRRR RSSSSRRVFF FRGGGVSSSL EVRPPPPWMQ
             GGEESEPLIA ATTTGGGRLL RDDDAAKKKK KVVVVDDLFF FIPPPIPPPI LLNGGGGSSR
             GGEESEPLIA ATTTGGGRLL RDDDAAKKKK KVVVVDDLFF FIPPPIPPPI LLNGGGGSSR

             QQQIAPAAVA VLLQLLHHRR LLGGGGDDDP PQAAP--DDA AAAVLLVASV PITIEDNKTG
             RRRVDPVVLA MLLQLLYYKK LL-------- -ETTPTTDDT TTTALLAKSL PVTGESNRVG
             RRRVDPVVLA MLLQLLYYKK LL-------- -ETTPTTDDT TTTALLAKSL PVTGESNRVG

             FFDDFTEEAA YYSSAVDSLL DDASSAGGGA GNNAMA--AY REIICDDDII VCCGGAAA-E
             FFNNVKNNAA FFSSAVESLL EEPLLGDSSE ERRVRVRREL RRIISGGGLL IGGPPGGGIE
             FFNNVKNNAA FFSSAVESLL EEPLLGDSSE ERRVRVRREL RRIISGGGLL IGGPPGGGIE

             RHEEEPPLLL SSRRWGLSAV GLQAMVLSSG EG-HHSSVVV EEEAAADGGT TLAEAAAGGG
             RMEEEEEKKK EEQQWGFESV SVQAILNNNY SNYSSIIVVV ESSKKKPGGS SLLR------
             RMEEEEEKKK EEQQWGFESV SVQAILNNNY SNYSSIIVVV ESSKKKPGGS SLLR------

             GGDDDDDNNN NNSNSNSSSS SGGGDNNSSN NKSDDDGGS
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -MTPFQQ--- --PDDAAAAS
t5g66770.1   MMAYYMMMCC TSSGNNNLLM MIAQVKQQQE QQQQQQQHHQ DHIGINNPPL PPPNNSLLLG
t5g66770.2   MMAYYMMMCC TSSGNNNLLM MIAQVKQQQE QQQQQQQHHQ DHIGINNPPL PPPNNSLLLG

             SSGLLDGFLP AAV------- --------AP PDDGGG---- ---------- ----------
             FFGLLSSAFP FQVTGDDSDG FPPPLHHHAT TTTGGGRRLL LLSSDDGTGE EFFFFSELLI
             FFGLLSSAFP FQVTGDDSDG FPPPLHHHAT TTTGGGRRLL LLSSDDGTGE EFFFFSELLI

             ---------- -----YDDP- ------D--- ---------- ---------D AAAAAAPEFA
             ISGDVDDDDG GGTTWHDDPY VVIIIYDPPT TYPSRRLLSV VPSLLLNRVD TTSSSSLPPT
             ISGDVDDDDG GGTTWHDDPY VVIIIYDPPT TYPSRRLLSV VPSLLLNRVD TTSSSSLPPT

             AAFPPPPDAA AVVLL-RRRR EEEEEEVVAG GIR------- --LLLHLSCA AAGAIIIIIE
             LLWPPPPLSP PLLTTETTKK EDPEEETTND DSEDDDDDDE PPLLLKADCA AA-RIIIIIS
             LLWPPPPLSP PLLTTETTKK EDPEEETTND DSEDDDDDDE PPLLLKADCA AA-RIIIIIS

             EADDDHHALL LSAAAQQQLL LDSSHAALAA AVVSAIGGRR VVHFTTARLL FSSVVAAPDA
             SDDDDPPNEE ESKKKTTTLL LQIIREEVSS ELLGP-EERR VFYFEEARLL SSSAATTSSS
             SDDDDPPNEE ESKKKTTTLL LQIIREEVSS ELLGP-EERR VFYFEEARLL SSSAATTSSS

             EEEHAAL-YY YYHHYYCPLF AAAHFFQIII AHGGCHVVID SMQWWPLIQQ QQAALLAAGG
             TTTEDDILYY YYTTNNCPSF AAAHLLQIII AEKKSKIIVD GVQWWPLLQQ QQAALLAASS
             TTTEDDILYY YYTTNNCPSF AAAHLLQIII AEKKSKIIVD GVQWWPLLQQ QQAALLAASS

             GGFFLLRIGG GGPSTTGDDE -LRRRRDDGR RDLRRSSRVF SSFFANNEEV RRWWWMMMQI
             SSTTIIRVGG PPPSGGEPPE PLIIIIAAGR RDFKKVVDLF DDFFT--LLL NNSSSSSSRV
             SSTTIIRVGG PPPSGGEPPE PLIIIIAAGR RDFKKVVDLF DDFFT--LLL NNSSSSSSRV

             IPGEAASLLG DDAAAQPP-D AVLLLCCVAS VRPPIVEQAA ADHNTTTFRR RRRFTTEYYY
             VPDEVAFLL- -----EPPID TALLLLLAKS LNPPVLEYVV VSLNVVVFRR RRRVKKNFYY
             VPDEVAFLL- -----EPPID TALLLLLAKS LNPPVLEYVV VSLNVVVFRR RRRVKKNFYY

             SSVFFSSLLD AAAGGGNAAA AE----AAYQ REEIICIIIV EGAAAAA--R EREPSRDDRL
             SSVFFSSLLE PNNSSERVVV VERRRREELG RRRIISLLLI EKTTGGGIIH EREEEQVVLM
             SSVFFSSLLE PNNSSERVVV VERRRREELG RRRIISLLLI EKTTGGGIIH EREEEQVVLM

             LLLTTTAAGS LGALLRRAAA RMVGLSGGG- HHSVGGLTTL HRRFFFAWWW WAAAGDDGGN
             MMMEEEAAGE LSAVVSSAAA KILWNNYNNY SSIVGGISSL NLLLLLLWWW W---------
             MMMEEEAAGE LSAVVSSAAA KILWNNYNNY SSIVGGISSL NLLLLLLWWW W---------

             NNNNSNGSGG SSDNNNSSSS NNKKKSSSGA RRDDSSSLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ------MMMM TTTFPPPFQ- ---------- PMMDAAAASS
t5g66770.1   MCTTGGLLAI IIQQQQIQQE QQQQHDHHHH IIIFGGGINP LSLLLNNNNP P--NSSLLGG
t5g66770.2   MCTTGGLLAI IIQQQQIQQE QQQQHDHHHH IIIFGGGINP LSLLLNNNNP P--NSSLLGG

             SGLDAFPPPP ---------- ------PDG- ---------- ---------- ----------
             GGLSGAPDDP GGGSNNPPGF FFPDHHTTGL SDDDFGGTGG EEEDMLLISG GGVVVAGPCC
             GGLSGAPPPP GGGSNNPPGF FFPDHHTTGL SDDDFGGTGG EEEDMLLISG GGVVVAGPCC

             ----DDPA-- -----GD--- ---------- ------DDAA AAAAPAAAPC CDAAAAVL--
             CDTWDDNDYY IYYYYGDPFF FTRRLLSQSS DDNNRVDDTT SSSSLTLLPS SLSSIPLTEE
             CDTWDDNDYY IYYYYGDPFF FTRRLLSQSS DDNNRVDDTT SSSSLTLLPS SLSSIPLTEE

             MRRREEEVI- ---------- VVHLLMMCAG GAAAAEAAGG GDALASSQHA AAAAAAAASS
             PTKKEPETSD LEEPPPPPPP LLKAAYYCA- -RRRRSDDSS SDNEASSTRS SSSEEDPPTT
             PTKKEPETSD LEEPPPPPPP LLKAAYYCA- -RRRRSDDSS SDNEASSTRS SSSEEDPPTT

             SGGIIGRATA RRLLFPPSPP VAPTTTTAAF L-YHHHFYYE EEACPPYFFF FAHTTAANNQ
             T----ERATA NRLLSPPSPP ATSSSSSSDL ILYKKTLNND DDACPPYFFF FAHTTAANNQ
             T----ERATA NRLLSPPSPP ATSSSSSSDL ILYKKTLNND DDACPPYFFF FAHTTAANNQ

             QAIFFFHHVH HHHVIDFFFL MQGLQPPAAA LLLRRRGGGP PPFF-LRTTG IGPSSSSDEE
             QAITTTEKIH HHHIVDFFFI VQGIQPPAAA LLTRRRSSGK KPTTQIRSSG IPASSSSPEE
             QAITTTEKIH HHHIVDFFFI VQGIQPPAAA LLTRRRSSGK KPTTQIRSSG IPASSSSPEE

             EEE--GRRAA ADSRRRRRSF RGAANSDEEE VPMMMLLIIA AGAAAAAAAF FNVVVQLHRL
             EEEPSGRRRR RDVDDDNNDF IPLT-PHLLL LGSSSFFVVD DDVVVVVAAV VNMMMQLYKL
             EEEPSGRRRR RDVDDDNNDF IPLT-PHLLL LGSSSFFVVD DDVVVVVAAV VNMMMQLYKL

             LGPAADPPIA AVLCSRRPPI ITTVEEEQQA DHHTTGLDRR RRFFTEAAFY YYSAAAFDSL
             L----DPPVT TALLSNNPPV VTTLEEEYYV SLLVVGANRR RRVVKNAAQF FFSAAAFESL
             L----DPPVT TALLSNNPPV VTTLEEEYYV SLLVVGANRR RRVVKNAAQF FFSAAAFESL

             DASSASSGGG AGGNNNAAME LQQRREEICC DDIVCGGGEE GAAAAARRER REPSRRWRLT
             EPLLGRRDDS EEERRRVVRE FGGRRRRISS GGLIGPPPEE KTTTGGHRER REEEQQWRME
             EPLLGRRDDS EEERRRVVRE FGGRRRRISS GGLIGPPPEE KTTTGGHRER REEEQQWRME

             RRAAAGGAVP GGGSNARQAR MVVLLFFEG- -SSEEDDTTT LLGWHGRRPP PLLFSAAWEE
             NNAAAGGSVK SSSNYASQAK ILLNNYYSNL YIIEEPPSSS LLAWNDLLPP PLLLTLSWRR
             NNAAAGGSVK SSSNYASQAK ILLNNYYSNL YIIEEPPSSS LLAWNDLLPP PLLLTLSWRR

             GGDGGDDDNN VVSSSGSDSS SGSSSKSSSS GGARRRSCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -----DFQ-- --DSSLDGGG
t5g66770.1   MYYYMMCCCD DDDSGNLLMM AQQQQQVIIK KQKKQQQQQQ QHHQDQINLS SNNGFLSSSS
t5g66770.2   MYYYMMCCCD DDDSGNLLMM AQQQQQVIIK KKKKQQQQQQ QHHQDQINLS SNNGFLSSSS

             FFLLA----- ---------- -----AGVVY Y--------- ---------- ----------
             AAFFQTGGGD SSSNNNPPPF FPHHHAGGGF FLSSFGGGTT GGGFFEESDE EWEEEETIIS
             AAFFQTGGGD SSSNNNPPPF FPHHHAGGGF FLSSFGGGTT GGGFFEESDE EWEEEETIIS

             ---------- -------YP- ---GADDDDD ---------- ---------- ---VDDDAAA
             SGGSVDGGPP PDDCCDTHNY YYYGPDDDDD FDDDYPPSSS VVQQSDDNNR RVVIDDDTTS
             SGGSVDGGPP PDDCCDTHNY YYYGPDDDDD FDDDYPPSSS VVQQSDDNNR RVVIDDDTTS

             APEFFAAAAF FFPAAPAAAV VVVV--MRRE EVGIR----- -----HHLSS AGAAEEEAAA
             SLPPPPPTLW WWPSSPSSIL LLLLHHPTKD ETDSEDDFFD LPPPPKKIDD A-RRSSSDDD
             SLPPPPPTLW WWPSSPSSIL LLLLHHPTKD ETDSEDDFFD LPPPPKKIDD A-RRSSSDDD

             DHALSSLAHH AALLLLAVSA IRHFFFFFTT ARFFFP-SAP PPTDAAHHHA FLLYYHFYYE
             DPNESSLLRR EEVVVVELGP -RYFFFFFTE ARSSSPNSTS SSSSSSEEED LIIYYKLNND
             DPNESSLLRR EEVVVVELGP -RYFFFFFTE ARSSSPNSTS SSSSSSEEED LIIYYKLNND

             ECPPPYYLLL FFAFFTNNNQ ILLEACCCDH HVVIIISSSS MQGLWWWAAA LAAALALLLR
             DCPPPYYSSS FFALLTNNNQ ILLEASSSNK HIIVVVGGGG VQGIWWWAAA LAAALATTTR
             DCPPPYYSSS FFALLTNNNQ ILLEASSSNK HIIVVVGGGG VQGIWWWAAA LAAALATTTR

             RPPGGGGGP- --LRRIIGGG GGPPPGGRLD DVGLRLLADD DDARRRRRVR RFFFRRGGGN
             RTTSSSSGKQ QQIRRVVGPP PPALLEESLA ATGNRLLRDD DDAKKKKKLN NFFFIIPPP-
             RTTSSSSGKQ QQIRRVVGPP PPALLEESLA ATGNRLLRDD DDAKKKKKLN NFFFIIPPP-

             SDVVWMMQIA AAGGEAVAAF LLHRLGPPAA AP----VLCV AKIFTTVQQQ EADDDDHNNK
             PHLLSSSRVD DDDDEVLAAV LLYKL----T TPTTTIALLA KRVVTTLYYY EVSSSSLNNR
             PHLLSSSRVD DDDDEVLAAV LLYKL----T TPTTTIALLA KRVVTTLYYY EVSSSSLNNR

             TGFFFLDDRF FTEEAFFYYY SVFDDDSLDA SAGGAGNMAA ---AYYLIIV VVCGEEEEER
             VGFFFANNRV VKNNAQQFFF SVFEEESLEN LGDDEERRVV RRRELLFLLI IIGPEEEEER
             VGFFFANNRV VKNNAQQFFF SVFEEESLEN LGDDEERRVV RRRELLFLLI IIGPEEEEER

             RERHEEEPPS SSWRDRLRGL SSVSNNALRA ARVFSGG-SS TLGWHHPAAA AAASSSSAAA
             RERMEEEEEE EEWRVLMNGF EEVNYYAVSA AKLYNYNLII SLAWNNPLLL LLLSSSSSS-
             RERMEEEEEE EEWRVLMNGF EEVNYYAVSA AKLYNYNLII SLAWNNPLLL LLLSSSSSS-

             GGDDGGGGNN NSSSSSDSNN NSNGGKGGRD GGSSSCCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --------MM D------WWM SSLLLLDDAG
t5g66770.1   YCDDSGNLLI IQQVVVIIKK KQQQQQQEQQ QQQQQHHHHH QPLLLNPWW- GGLLLLSSGS
t5g66770.2   YCDDSGNLLI IQQVVVIIKK KQQQQQQEQQ QQQQQHHHHH QPLLLNPWW- GGLLLLSSGS

             GGGFFLPAA- ---------- ---------- AADDVGYY-- ---------- ----------
             SSSAAFDFQG GGGGDNNNDG GFPNLLHHHH AATTGGFFGG GGGGGEFEED WMMTLLSGGD
             SSSAAFPFQG GGGGDNNNDG GFPNLLHHHH AATTGGFFGG GGGGGEFEED WMMTLLSGGD

             --------AA AA-----GGD D--------- ---------- --DAALLPEF FAAFPPCDAV
             DSSAPCDTDD DDVIIIIGGD DFFYYPPSRR RRLSVSDDDL LVDTSPPLPP PTLWPPSLIL
             DSSAPCDTDD DDVIIIIGGD DFFYYPPSRR RRLSVSDDDL LVDTSPPLPP PTLWPPSLIL

             L----MEEEE EEEEIR---- ---LLLLLLL MSCAAAGGAA IAHSAAAASH HHALLLAVVV
             THEEEPEDDD PPPESEDDDD LEPLLAAIII YDCAAA--RR IDPSKLLLIR RRSVVVSLLL
             THEEEPEDDD PPPESEDDDD LEPLLAAIII YDCAAA--RR IDPSKLLLIR RRSVVVSLLL

             GIGGAAVVVH TTTALSSRRL P-PAAAPTDA AFF-YHFEAC CCPYLLLKFF AAAFTNQQLE
             --EEAAFFFY EEEALSSNRL PNPTTTSSSS DLLSYKLDAC CCPYSSSKFF AAALTNQQLE
             --EEAAFFFY EEEALSSNRL PNPTTTSSSS DLLSYKLDAC CCPYSSSKFF AAALTNQQLE

             EAFFHCCCDV VVVDSSSLLL MMMQGQWWPP AALLQLLLLL LRPPPPGGPP PF-LLRRRGI
             EATTESSSNI IIIDGGGIII VVVQGQWWPP AALLQLLLTT TRTTTTSGKP PTQIIRRRGI
             EATTESSSNI IIIDGGGIII VVVQGQWWPP AALLQLLLTT TRTTTTSGKP PTQIIRRRGI

             GGPPPPPPSS SPPPTRDD-- RVVLRLLADD DLLSRRRVVV FSFGVNNSSL DDEVRRPWLL
             PPAAPPPPSS SLLLGSPPPS ITTNRLLRDD DFFVDDDLLL FDFPI--PPI HHLLNNGSFF
             PPAAPPPPSS SLLLGSPPPS ITTNRLLRDD DFFVDDDLLL FDFPI--PPI HHLLNNGSFF

             LQPGSLLLRR RGQQQPPP-- DDAAVLCVVA VVRPPIFFTV IIQDDTGGFL TELFFFYYAV
             FRPDFLLLKK K-EEEPPPII DDTTALLAAK LLNPPVVVTL GGYSSVGGFA KNLQQQFYAV
             FRPDFLLLKK K-EEEPPPII DDTTALLAAK LLNPPVVVTL GGYSSVGGFA KNLQQQFYAV

             VFFDDSSLLD AAAAAAAASG GNNNNMAAEE LQEECDCGGG AEERHHLSSS WRDRRRRTRA
             VFFEESSLLE PPNNNGGGRS ERRRRRVVEE FGRRSGGPPK GEERMMKEEE WRVLLLLENA
             VFFEESSLLE PPNNNGGGRS ERRRRRVVEE FGRRSGGPPK GEERMMKEEE WRVLLLLENA

             SAAVVGGALL AMLLLVVGFS SGGE--HSEE EGGCTLLGGW WHPPLFFFSA AWWEAAAAGD
             ESSVVSSAVV AILLLLLWYN NYYSYYSIEE EGGFSLLAAW WNPPLLLLTS SWWR------
             ESSVVSSAVV AILLLLLWYN NYYSYYSIEE EGGFSLLAAW WNPPLLLLTS SWWR------

             GGGGGGDNNN SSNSVVSSSS GGSSSNNSGG GKSRRSVVL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- -------MMD DFFPPFQ--- ----WWWWMA SSGDDPPAAV
t5g66770.1   MAYMMMCSGL QQQVKQQQQQ QQQQHHDHHQ QFFGGINPPS LLNPWWWW-S GGGSSPDFQV
t5g66770.2   MAYMMMCSGL QQQVKKQQQQ QQQQHHDHHQ QFFGGINPPS LLNPWWWW-S GGGSSPPFQV

             ---------- -------APP DDGGGYY--- ---------- ---------- ----------
             TTDSSNNDDG PPPPLHHATT TTGGGFFDDG GGGGESSDEE EMTTLISGGV VDGGPPPDCC
             TTDSSNNDDG PPPPLHHATT TTGGGFFDDG GGGGESSDEE EMTTLISGGV VDGGPPPDCC

             --YYDPPAA- ------GGGA DDD------- ---------- --DAALLFAP PPPDDDAAAA
             CDHHDNPDDY YYIIYYGGGP DDDPPFDDYS RLSVQQSSSL LRDTSPPPTP PPPLLLIIPP
             CDHHDNPDDY YYIIYYGGGP DDDPPFDDYS RLSVQQSSSL LRDTSPPPTP PPPLLLIIPP

             AL-ARREEEV GGGGII---- ----VVVHHL LSCGGAIEAG GGGGHAAQLA AASHAAAVSS
             PTESTKDDPT DDDDSSDDDD DFEPLLLKKI IDC--RISDS SSSSPKKTLL LLIREEELGG
             PTESTKDDPT DDDDSSDDDD DFEPLLLKKI IDC--RISDS SSSSPKKTLL LLIREEELGG

             AAAAIIGGGG RVVHTAALLL LRLFP-SVVA PPDAAHHLL- ---HHYCPLK KFFFAAFFTA
             DPPP--EEEE RVFYEAALLL LRLSPNSAAT SSSSSEEIIL LSSKTNCPSK KFFFAALLTA
             DPPP--EEEE RVFYEAALLL LRLSPNSAAT SSSSSEEIIL LSSKTNCPSK KFFFAALLTA

             NNILEFHHGD VIDDFSQALQ AALAARRPGG GPPPF-LLRI GGIGPPPTGR RD---LRRGL
             NNILETEEKN IVDDFGQALQ AALAARRTGG GKPPTQIIRV GGIPAPLGES SPPPPLIIGN
             NNILETEEKN IVDDFGQALQ AALAARRTGG GKPPTQIIRV GGIPAPLGES SPPPPLIIGN

             LLDLAARSVR VFFFSFFGVN SSSSLVVRLI AAPAAVVAAF FFSVLLHLDP DDA--IDDDA
             LLDFAAKVLD LFFFDFFPI- PPPPILLNFV DDPVVLLAAV VVFMLLYL-- DDTTIVDDDT
             LLDFAAKVLD LFFFDFFPI- PPPPILLNFV DDPVVLLAAV VVFMLLYL-- DDTTIVDDDT

             AVVLLDDDCA VPKFTVQQAD DDKKKTGGFF FLLFFFTTTE ELLLFYYYYY SSAVVVFDDS
             TAALLRRRLK LPRVTLYYVS SSRRRVGGFF FAAVVVKKKN NLLLQFFFYY SSAVVVFEES
             TAALLRRRLK LPRVTLYYVS SSRRRVGGFF FAAVVVKKKN NLLLQFFFYY SSAVVVFEES

             SDSSGGAANN NAAEEAYLEI CICGEEGGGA AAAA--REER HHEEPSRRRR WWWDRRRRRT
             SELLSSEERR RVVEEELFRI SLGPEEKKKT TGGGIIHEER MMEEEEQQQQ WWWVLLLLLE
             SELLSSEERR RVVEEELFRI SLGPEEKKKT TGGGIIHEER MMEEEEQQQQ WWWVLLLLLE

             TRALLLLLLS VPLNLQARMM LVVVGGLLFS GEG--SSVVV VEAGGTLLLL GGGRPLLASA
             ENAFFFFFFE VKLYVQAKII LLLLWWNNYN YSNLYIIVVV VEKGGSLLLL AAALPLLLSS
             ENAFFFFFFE VKLYVQAKII LLLLWWNNYN YSNLYIIVVV VEKGGSLLLL AAALPLLLSS

             EGGGGDGDNN NNNNSVVVVS SSGSNGSKKS SGGARRRSC
             R--------- ---------- ---------- ---------
             R--------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -----MDFFF ---WWPPMDD
t5g66770.1   MMMAAYYYYM MTDGNLAIIA QQQKQQQKQQ QQQQEQQQQQ QHHHHHQFII LLPWWPP-NN
t5g66770.2   MMMAAYYYYM MTDGNLAIIA QQQKKKKKQQ QQQQEQQQQQ QHHHHHQFII LLPWWPP-NN

             DAAASSGLLD AFLPPAAAV- ---------- ------PPDD G--------- ----------
             NSSSGGGLLS GAFDPFQQVG GGDSPFPNNH HHHHHHTTTT GRRDTGGEFE DEEWMETGGD
             NSSSGGGLLS GAFPPFQQVG GGDSPFPNNH HHHHHHTTTT GRRDTGGEFE DEEWMETGGD

             ---------- ---YYDPPAA ----GADD-- -------DAL LLEFFFFPPP CAAPAA----
             VADDGGGGCC CDTHHDPPDD YIIIGPDDPP TSSVQDLDSP PPPPPPWPPP SSSPIPEEEE
             VADDGGGGCC CDTHHDPPDD YIIIGPDDPP TSSVQDLDSP PPPPPPWPPP SSSPIPEEEE

             AAARREEEEE VVAGIRR--- ----LVHLLM MSCCAGGGIG DLASSQQLAA ADAAASSSAA
             SSSTTEEEDE TTNDSEEDDF LEEPLLKIIY YDCCA---IS DEASSTTLLL LQEEEGGGDP
             SSSTTEEEDE TTNDSEEDDF LEEPLLKIIY YDCCA---IS DEASSTTLLL LQEEEGGGDP

             ASGGRRRVVF FTTASSRLLL F-SSPPPADD AAEEEHAFL- -YHHFYYACP PPLLLHHTAE
             PT--RRRVFF FTEASSNLLL SNSSPPPTSS SSTTTEDLIL LYKTLNNACP PPSSSHHTAE
             PT--RRRVFF FTEASSNLLL SNSSPPPTSS SSTTTEDLIL LYKTLNNACP PPSSSHHTAE

             AGGDDHHHVV FSSGQQQQWW WPPALQLAAL RPPP-RIIII PPSPPPTRDD DDEEE-RDDV
             AKKNNKKKII FGGGQQQQWW WPPALQLAAT RKPPQRVVII AASLLLGSPP PPEEESIAAT
             AKKNNKKKII FGGGQQQQWW WPPALQLAAT RKPPQRVVII AASLLLGSPP PPEEESIAAT

             GRLLLALLLL ASSRFSSVVA AANSSDVVVR RPWWMLQQAP AVAFFNNSVL RLLLGGDDPA
             GRLLLRFFFF AVVDFDDIIL LT-PPHLLLN NGSSSFRRDP VLAVVNNFML KLLL------
             GRLLLRFFFF AVVDFDDIIL LT-PPHLLLN NGSSSFRRDP VLAVVNNFML KLLL------

             ADQQAPP--- DDAAAVCVAA SSSVVVRRFF FTTIEEDDDD DNNKTTGDDR RRFTEEAAFF
             -DEETPPTTI DDTTTALAKK SSSLLLNNVV VTTGEESSSS SNNRVVGNNR RRVKNNAAQQ
             -DEETPPTTI DDTTTALAKK SSSLLLNNVV VTTGEESSSS SNNRVVGNNR RRVKNNAAQQ

             AAVVFLDDDA AAASAAGGAA A-AAYLLQRE IIIIVGAAA- PPLLSRWWRD DDRGLLAAVP
             AAVVFLEEEP PPNLGGEEVV VREELFFGRR ILLLIKTGGI EEKKEQWWRV VVLGFFSSVK
             AAVVFLEEEP PPNLGGEEVV VREELFFGRR ILLLIKTGGI EEKKEQWWRV VVLGFFSSVK

             SNNNARRQLL LFFSGGG--H HSVVEAADGG CLLGGWHHGG RLLAASWEEA AGDDGGNNNN
             NYYYASSQLL NYYNYYNLYS SIVVEKKPGG FILAAWNNDD LLLLLSWRR- ----------
             NYYYASSQLL NYYNYYNLYS SIVVEKKPGG FILAAWNNDD LLLLLSWRR- ----------

             NNSSNNSNVV SGSSGSDSNN NSNGGKSGAA ARSSSVVCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- -DFFFPP--- ---------P PAAAAASSSG
t5g66770.1   AAYDNLLLAA IAQQQQQVVV IQQQQQQHHQ DQFFFGGPLS LLLLNNNNPP TSSSLLGFFG
t5g66770.2   AAYDNLLLAA IAQQQQQVVV IKKQQQQHHQ DQFFFGGPLS LLLLNNNNPP TSSSLLGFFG

             LAGGPPPAV- ---------- ---------- ---APDDVYY ---------- ----------
             LGSSPPPFVG DDSSSNNNND DDPPPGGGGF PFHATTTGFF RRLFGGGGGG GEEFESWWME
             LGSSPPPFVG DDSSSNNNND DDPPPGGGGF PFHATTTGFF RRLFGGGGGG GEEFESWWME

             ---------- ----PPPAA- --GA------ ---------V VDDALEEEFA AAAAAFFPAA
             EETTDVAGPD CDDTNNPDDV YYGPPFSSLL SVPSSLLVVI IDDSPPPPPP PPTLLWWPSS
             EETTDVAGPD CDDTNNPDDV YYGPPFSSLL SVPSSLLVVI IDDSPPPPPP PPTLLWWPSS

             PDAAV--MMM RRREEEVVAA GR-------- -----LLMMS CCGGGAEEAG GGDHLLSAAL
             PLPPLHEPPP TTKEPPTTNN DEDDFLLLEE EPPPPLAYYD CC---RSSDS SSDPEESKKL
             PLPPLHEPPP TTKEPPTTNN DEDDFLLLEE EPPPPLAYYD CC---RSSDS SSDPEESKKL

             AADSSHAAAA AASSAIGGRR RRVVVHFTTT ASSRRRFFPS PVVVAPPTTT TDAAAAAL-Y
             LLQIIRESSS EEGGD-EERR RRVVVYFTTT ASSNRRSSPS PAAATSSSSS SSSSDDDILY
             LLQIIRESSS EEGGD-EERR RRVVVYFTTT ASSNRRSSPS PAAATSSSSS SSSSDDDILY

             HHHYACCPYF NNNNQQAFFH HGCCVVVVHH IIIIDDFFLL MQQGGLAIIQ AAALLRGPFL
             KTTNACCPYF NNNNQQATTE EKSSIIIIHH VVVVDDFFII VQQGGIALLQ AAATTRSKTI
             KTTNACCPYF NNNNQQATTE EKSSIIIIHH VVVVDDFFII VQQGGIALLQ AAATTRSKTI

             RIGGPPGGRR DDEE--LLRR DVLRRRLADL LLRRRVVRRR FFRGVVVANS SLDDEVPMLQ
             RVPPPLEESS PPEESSLLII ATNRRRLRDF FFKKKLLDNN FFIPIIIL-P PIHHLLGSFR
             RVPPPLEESS PPEESSLLII ATNRRRLRDF FFKKKLLDNN FFIPIIIL-P PIHHLLGSFR

             QIIIAPPPEE ANNSVQQLLH LLLGDP---D DAVVLLDDCV VAVVPKFFTT IIQQEAADHH
             RVVVDPPPEE VNNFMQQLLY LLL-DPIIID DTAALLRRLA AKLLPRVVTT GGYYEVVSLL
             RVVVDPPPEE VNNFMQQLLY LLL-DPIIID DTAALLRRLA AKLLPRVVTT GGYYEVVSLL

             HHKTFDREEE ALLFFYYYSS VAASAAANAA AAE-AAYYQR REIDVCGEGA A-ELLSSTAA
             LLRVFNRNNN ALLQQFYYSS VPPLEEERVV VVEREELLGR RRIGIGPEKT GIEKKEEEAA
             LLRVFNRNNN ALLQQFYYSS VPPLEEERVV VVEREELLGR RRIGIGPEKT GIEKKEEEAA

             AGGGLSSSVV PLLGGSSNAR QARMMFGEG- -HHHSSVVEE EDDCLLLLLG GWHGRRRPFA
             AGGGFEEEVV KLLSSNNYAS QAKIIYYSNL YSSSIIVVES SPPFILLLLA AWNDLLLPLL
             AGGGFEEEVV KLLSSNNYAS QAKIIYYSNL YSSSIIVVES SPPFILLLLA AWNDLLLPLL

             AAAAGGDDGG GGGDNNVGSG SSNGGSSGGA RRGSSVVVC
             LSS------- ---------- ---------- ---------
             LSS------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----MTFF-- --WWPDDDDP AASGGGDAAA
t5g66770.1   MAYMMCTTTD DDSSGNLLMI IKKQQEQQQH HQQDHIIIPL NNWWPNNNNT SSGGGGSGGG
t5g66770.2   MAYMMCTTTD DDSSGNLLMI IKKQQEQQQH HQQDHIIIPL NNWWPNNNNT SSGGGGSGGG

             GFPPAAV--- ---------- ---------- --AAPDDGVV GYYY------ ----------
             SAPPQQVTGG GDSSNPPGFF PPPNNNLLHH HHAATTTGGG GFFFLDGGGG TGGGGGEEEF
             SAPPQQVTGG GDSSNPPGFF PPPNNNLLHH HHAATTTGGG GFFFLDGGGG TGGGGGEEEF

             ---------- ---------- --YYPPPPPA ----GA---- ---------- -----AAAAL
             SDEEETGGGG GDDSVVDDGP DDHHNPPPPD IIYYGPFDDY PPRRSSVVVQ DDLNVTSSSP
             SDEEETGGGG GDDSVVDDGP DDHHNPPPPD IIYYGPFDDY PPRRSSVVVQ DDLNVTSSSP

             PFAFPCCPPA AAAAAAAAAV LLL----ARE EEEAAGI--- ----VHLLLM SCAGIAAAAL
             LPPWPSSPPS IIIPPPPPPL TTTHEEESTP PEENNDSDDF LLPPLKAAAY DCA-IDNNNE
             LPPWPSSPPS IIIPPPPPPL TTTHEEESTP PEENNDSDDF LLPPLKAAAY DCA-IDNNNE

             ASSDHHAAAL AAAAVAAAAA SSGGGGGIIG VVVHHHTTTL SSSRFPPPPP PPPDDDDAEH
             ASSQRREEEV SEEELDDPPP TT-------E FFFYYYTTEL SSSRSPPPPS SSSSSSSSTE
             ASSQRREEEV SEEELDDPPP TT-------E FFFYYYTTEL SSSRSPPPPS SSSSSSSSTE

             F-----YHFY YAACCCCFAA AAANQQLGHH VVVHVVIFLQ QQQQPPAAAL LQQALALRGG
             LLLLLSYTLN NAACCCCFAA AAANQQLKKK IIIHIIVFIQ QQQQPPAAAL LQQALATRSS
             LLLLLSYTLN NAACCCCFAA AAANQQLKKK IIIHIIVFIQ QQQQPPAAAL LQQALATRSS

             PPFLRITGIG PPSSPPRREE E--LLDDLLR RADLLLAVVR RRRFSFFGVN NLLVVRPWMQ
             PPTIRVSGIP AASSLLSSEE EPSLLAANNR RRDFFFALLD DNNFDFFPI- -IILLNGSSR
             PPTIRVSGIP AASSLLSSEE EPSLLAANNR RRDFFFALLD DNNFDFFPI- -IILLNGSSR

             QIAPEEAAVV VANNNSSSVV LLQQLHRLLL LGDDDQ---- IDDAVLDDDD VAVVVVRKIF
             RVDPEEVVLL LANNNFFFMM LLQQLYKLLL L-DDDETTTI VDDTALRRRR AKLLLLNRVV
             RVDPEEVVLL LANNNFFFMM LLQQLYKLLL L-DDDETTTI VDDTALRRRR AKLLLLNRVV

             FTIEQQAANK FLLLDDRRFY YYYVVDSDAS GGGGAANAAL RRIIIDGEGG ARRRERPSRR
             VTGEYYVVNR FAAANNRRQF YYYVVESEPR DSSSEERVEF RRIIIGPEKK THRREREERR
             VTGEYYVVNR FAAANNRRQF YYYVVESEPR DSSSEERVEF RRIIIGPEKK THRREREERR

             RDDRLLLLTG GSAAAAVPLS ALRARRLLVL GGG----VEA DGLLLGFSSA AAEDDDDGGG
             RVVLMMMMEG GESSSSVKLN AVSAKKLLLN YNNLLLYVSK PGIILDLTTL LLR-------
             RVVLMMMMEG GESSSSVKLN AVSAKKLLLN YNNLLLYVSK PGIILDLTTL LLR-------

             GGGDNNNNSS GSSSGGDNSG SNGGKKSSSA RDGGSSSSL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------T FFQQQQ---W WMMMDSSSSS
t5g66770.1   MAAYYMDDSN NLMIQQVIIK KQKKQQQQQQ QQQQQQHQDI FINNNNSNNW W---NGGGFF
t5g66770.2   MAAYYMDDSN NLMIQQVIIK KKKKQQQQQQ QQQQQQHQDI FINNNNSNNW W---NGGGFF

             GGFLPPPAAA V--------- ---------- --PPDGVGGG ---------- ----------
             GSAFPPDFFQ VTDDPPGFFP PNLLDHHHHH HHTTTGGGGG RSDFFGGGGT TGGEEFFFFF
             GSAFPPPFFQ VTDDPPGFFP PNLLDHHHHH HHTTTGGGGG RSDFFGGGGT TGGEEFFFFF

             ---------- ---------- ---YYYDPPA --GA------ ---------- AAAAAAAAAF
             EEEEMETTLI SGGGSSADPD DTWHHHDNPD YVGPPPFFFD YPSRRLSPSV SSSPTTTTTW
             EEEEMETTLI SGGGSSADPD DTWHHHDNPD YVGPPPFFFD YPSRRLSPSV SSSPTTTTTW

             FPPAAAAVL- --REEEEEEE AII------- LLVLLMSSGG GAIEAGHLAA AAQLDDSAAA
             WPPSIIPLTH EETEEEEDEE NSSDDDDDLL LLLAIYDD-- -RISDSPEAA KKTLQQISSS
             WPPSIIPLTH EETEEEEDEE NSSDDDDDLL LLLAIYDD-- -RISDSPEAA KKTLQQISSS

             LLLLGGIGGG VHHHTTTTTA LFFPSPPVAA EAFLYYYHHF EAAACYLLTA ANAAIIIHHG
             VVVV---EEE VYYYTTTEEA LSSPSPPATS TDLIYYYKKL DAAACYSSTA ANAAIIIEEK
             VVVV---EEE VYYYTTTEEA LSSPSPPATS TDLIYYYKKL DAAACYSSTA ANAAIIIEEK

             GCVHVIQQQL QWAAIIQQAL AARPGGTTGG GIGPPPPPPT GRD--LRDDV GLRADLARRS
             KSIHIVQQQI QWAALLQQAL AARTSSSSGG GIPPPLLLLG ESPPSLIAAT GNRRDFAKKV
             KSIHIVQQQI QWAALLQQAL AARTSSSSGG GIPPPLLLLG ESPPSLIAAT GNRRDFAKKV

             SSVRVSRAAN NLEVRPWWWM MLQQPGGEEA AVFSQQLHHR LGDPDDQAAP P---DVVLLV
             VVLDLDILT- -ILLNGSSSS SFRRPDDEEV VLVFQQLYYK L---DDETTP PTTTDAALLA
             VVLDLDILT- -ILLNGSSSS SFRRPDDEEV VLVFQQLYYK L---DDETTP PTTTDAALLA

             VAVRPKIIII FFIQQQEAAD DDNKGGFFFL LDDRFEAFYY YSAAVDSLAA SAGGGGGNNN
             AKLNPRVVVV VVGYYYEVVS SSNRGGFFFA ANNRVNAQYY YSAAVESLNN LGDSSSERRR
             AKLNPRVVVV VVGYYYEVVS SSNRGGFFFA ANNRVNAQYY YSAAVESLNN LGDSSSERRR

             AAAMMMAAAE EEAAAALRRE EEIIVVCCCG EGGRREHEEE EPSSRRRRDD LLTRAAGLSA
             VVVRRRVVVE EEEEEEFRRR RRILIIGGGP EKKRREMEEE EEEEQQQQVV MMENAAGFES
             VVVRRRVVVE EEEEEEFRRR RRILIIGGGP EKKRREMEEE EEEEQQQQVV MMENAAGFES

             VGGNNALQAA AARLGGLLSG EEE----HSE CLLTTLLGWW HRRRPPFSSA WEAAAGDDDN
             VSSYYAVQAA AAKLWWNNNY SSSLLLLSIS FIISSLLAWW NLLLPPLSSS WR--------
             VSSYYAVQAA AAKLWWNNNY SSSLLLLSIS FIISSLLAWW NLLLPPLSSS WR--------

             NNNNSSNVVS SGSSDSNGSS SNSARGGGGG SSVCCCCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --TFFFFQ-- --PPMDDASS GLLDAGFLPP
t5g66770.1   MCDSSAIIIQ QIKKQQQQKQ QQQQQHHHHQ QDIFIIINLS SLPP-NNLFF GLLSGSAFDD
t5g66770.2   MCDSSAIIIQ QIKKKKKKKQ QQQQQHHHHQ QDIFIIINLS SLPP-NNLFF GLLSGSAFPP

             PPAVV----- ---------- -----APDDG GG-------- ---------- ----------
             PPQVVTTGGG GSNGFFPPPD HHHHHATTTG GGRRSFGGGG GGEESSDDDW MMETLIGDDD
             PPQVVTTGGG GSNGFFPPPD HHHHHATTTG GGRRSFGGGG GGEESSDDDW MMETLIGDDD

             ---------- -----YDPPP PAA--GAAD- ---------- ---------- VDAAAALPFA
             SSVVAADDDD DDTWWHDNNP PDDYYGPPDF DTTPSLLSSS VQQDDLNNRR IDTTSSPLPP
             SSVVAADDDD DDTWWHDNNP PDDYYGPPDF DTTPSLLSSS VQQDDLNNRR IDTTSSPLPP

             AFPCAAAAAA AAVLLAAAAA AREEVAIR-- -----HLAGG GAAIEEAAGD DDHAAAASSA
             TWPSSSSSSP PPLTTSSSSS STEETNSEDF DLEEPKAA-- -RRISSDDSD DDPNAAASSK
             TWPSSSSSSP PPLTTSSSSS STEETNSEDF DLEEPKAA-- -RRISSDDSD DDPNAAASSK

             AQDSHALLLA ASAAAAGIII GGRAVVTTTT TLLLRRLPSP AAPPPPTDAE EEEAAFLHHH
             KTQIREVVVS EGDPPP---- EERAFFTTTT ELLLNRLPSP TTSSSSSSST TTTDDLIKKT
             KTQIREVVVS EGDPPP---- EERAFFTTTT ELLLNRLPSP TTSSSSSSST TTTDDLIKKT

             HECYYKKFAH AQAILLLEAA FHGVVVVDMG GLQWPPALLI QAAALRGPPF -RRGPPPPGE
             TDCYYKKFAH AQAILLLEAA TEKIIIIDVG GIQWPPALLL QAAATRSKKT QRRPAAPPEE
             TDCYYKKFAH AQAILLLEAA TEKIIIIDVG GIQWPPALLL QAAATRSKKT QRRPAAPPEE

             LLDLLDDLLA RRRRRSSVVR VRRRFSRGVV SSEVVPWMLA PGQHHLLGDA DDDDP---DA
             LLANNDDFFA KKKKKVVLLD LNNNFDIPII PPLLLGSSFD PDQYYLL--- DDDDPTTTDT
             LLANNDDFFA KKKKKVVLLD LNNNFDIPII PPLLLGSSFD PDQYYLL--- DDDDPTTTDT

             VCCVAVRPPI IFFFFFTTIQ QAHHNNFFLD RTTEEELLFY YYYSAAFSSL DAAAAASGGA
             ALLAKLNPPV VVVVVVTTGY YVLLNNFFAN RKKNNNLLQF YYYSAAFSSL EPNNGGRDSE
             ALLAKLNPPV VVVVVVTTGY YVLLNNFFAN RKKNNNLLQF YYYSAAFSSL EPNNGGRDSE

             GGGGNNMMAE EYYYYYLLQQ RIIIVAAARR HHHESSRWWR RRRRTTRRGL SAAGSNNNAL
             EEEERRRRVE ELLLLLFFGG RIILITTGHR MMMEEEQWWR RRRLEENNGF ESSSNYYYAV
             EEEERRRRVE ELLLLLFFGG RIILITTGHR MMMEEEQWWR RRRLEENNGF ESSSNYYYAV

             QQRRRLVVGF FFG---EEEA AGGCCCTLGG HGPPPLLLLF SSAEAAAAGG GGGGDDNNNN
             QQKKKLLLWY YYNLLYEESK KGGFFFSLAA NDPPPLLLLL TSSR------ ----------
             QQKKKLLLWY YYNLLYEESK KGGFFFSLAA NDPPPLLLLL TSSR------ ----------

             SSNNNSSVSS GSSSSSDSNN NNSSSSGGKS SSADDSVCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -MMPPF---- ---WPPAAAS
t5g66770.1   MYMCCCTSSS GLMMMIQQVV IKKQQQQQQQ QEEQQQQQHH QHHGGIPPLL LPPWTTLLLG
t5g66770.2   MYMCCCTSSS GLMMMIQQVV IKKQQQQQQQ QEEQQQQQHH QHHGGIPPLL LPPWTTLLLG

             SSGLLLAGGF PPPPAAAAVV ---------- --APDGGGGY YYY------- ----------
             FFGLLLGSSA DDPPFFFQVV TGGGDSSPPP DHATTGGGGF FFFRLLGTTT GGGESEEWLL
             FFGLLLGSSA PPPPFFFQVV TGGGDSSPPP DHATTGGGGF FFFRLLGTTT GGGESEEWLL

             ---------- ---------- YDDPP----G AD-------- -------VDA LLLPFAAAAF
             LISGGGDSSV VVDPDDCCDW HDDPPYIIYG PDPDTPPSSS QQSLLNRIDS PPPLPPTLLW
             LISGGGDSSV VVDPDDCCDW HDDPPYIIYG PDPDTPPSSS QQSLLNRIDS PPPLPPTLLW

             CAPDDDDAAA AVV-AAMRRE EEEVAGIIRR ---------L LVVVVSSCCC GIIEEAAGDD
             SSPLLLLSIP PLLESSPTTE EEETNDSSEE DDDFDLEPPL LLLLLDDCCC -IISSDDSDD
             SSPLLLLSIP PLLESSPTTE EEETNDSSEE DDDFDLEPPL LLLLLDDCCC -IISSDDSDD

             DDHHLAQQQQ LHHAAALLAA VVSAAARRRA VVVHTLLSLL -SSVVVPPTE AFL---YYHF
             DDPPEKTTTT LRRSSSVVSE LLGDDDRRRA FFFYELLSLL NSSAAASSST DLILSSYYKL
             DDPPEKTTTT LRRSSSVVSE LLGDDDRRRA FFFYELLSLL NSSAAASSST DLILSSYYKL

             YYYCCCLLKF HFQQAIEEAF FFHHCDDHHI DFFSQLQWWW WPALIQAARP GGPRRRRGIG
             NNNCCCSSKF HLQQAIEEAT TTEESNNKKV DFFGQIQWWW WPALLQAART SGPRRRRGIP
             NNNCCCSSKF HLQQAIEEAT TTEESNNKKV DFFGQIQWWW WPALLQAART SGPRRRRGIP

             PPSSPTE-LL RDDVLLLRRR LAAARSSSVV RFFFSFRGAA NNNNSSSLLL DDEVMMMIIA
             PPSSLGEPLL IAATNNNRRR LRRAKVVVLL NFFFDFIPLT ----PPPIII HHLLSSSVVD
             PPSSLGEPLL IAATNNNRRR LRRAKVVVLL NFFFDFIPLT ----PPPIII HHLLSSSVVD

             APPGGEEEEA VVVVVNNVVV LLLGPPPAAD AA--DDVVLD CCCSVPKKII IFTEEQDHKT
             DPPDDEEEEV LLLLLNNMMM LLL------D TTTIDDAALR LLLSLPRRVV VVTEEYSLRV
             DPPDDEEEEV LLLLLNNMMM LLL------D TTTIDDAALR LLLSLPRRVV VVTEEYSLRV

             DRFLLYAADS LLDSGGGAMA -QEEIIDDDV CCGEAAAA-- RRERHPPLSS WRRDRLTRAA
             NRVLLYAAES LLELDSEVRV RGRRIIGGGI GGPETTTGII HRERMEEKEE WRRVLMENAA
             NRVLLYAAES LLELDSEVRV RGRRIIGGGI GGPETTTGII HRERMEEKEE WRRVLMENAA

             GLLSPARAAR VGFSGG--HV VVEAAAGGGC LLLWWGGGSW EEAAAADDGG GDNNSSNNNN
             GFFEKASAAK LWYNYNLYSV VVEKKKGGGF IILWWDDDSW RR-------- ----------
             GFFEKASAAK LWYNYNLYSV VVEKKKGGGF IILWWDDDSW RR-------- ----------

             NVGSSDSNNS SGKSSSGGGA RRDDGGSSSS VCCCLLLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- ------MDTP Q----MMDPA
t5g66770.1   AAACCTDSLL AAAAQQQQQV VVIIIKKQKK QQQQEQQQQQ QQQQQQHQIG NPLLN--NTS
t5g66770.2   AAACCTDSLL AAAAQQQQQV VVIIIKKKKK QQQQEQQQQQ QQQQQQHQIG NPLLN--NTS

             ASSLLDAFPP PPVV------ -------AAA APDDGG---- ---------- ----------
             LFFLLSGADD DPVVTGGSPP GPLLHHHAAA ATTTGGRLLS DFGGGEESSS TTLGDSDDCT
             LFFLLSGAPP PPVVTGGSPP GPLLHHHAAA ATTTGGRLLS DFGGGEESSS TTLGDSDDCT

             --DP---DD- ---------- ---------- VDDAAAALLL EFFAAPPPCC AAAAAV-REE
             WWDPYVYDDP PPPTTTYPPS RRSQSDDNVV IDDTSSSPPP PPPPLPPPSS SSIPPLHKEE
             WWDPYVYDDP PPPTTTYPPS RRSQSDDNVV IDDTSSSPPP PPPPLPPPSS SSIPPLHKEE

             EEEVGGIRR- --------LH HLLCCGAIIE EGGGGHAALL LASAQQLADS SHHAAAAAAV
             EPPTDDSEED FFDDEPPPLK KAICC-RIIS SSSSSPNNEE EASKTTLLQI IRRESSSSSL
             EPPTDDSEED FFDDEPPPLK KAICC-RIIS SSSSSPNNEE EASKTTLLQI IRRESSSSSL

             AAIRRRVVAV VHHFFTTLLL PSVVPPPTAA AEEHAAL--- HHHHFFYACP PLKKHHHFTA
             DP-RRRVVAF FYYFFTELLL PSAASSSSSS STTEDDILLS KKKTLLNACP PSKKHHHLTA
             DP-RRRVVAF FYYFFTELLL PSAASSSSSS STTEDDILLS KKKTLLNACP PSKKHHHLTA

             NNQILFFFFH GGCDDHIIIS GLLLWPAAAA IQQAAALLRR RPPGGGGPPP -LLLITGPPP
             NNQILTTTTE KKSNNKVVVG GIIIWPAAAA LQQAAATTRR RTTSSSGKKP QIIIVSGAPP
             NNQILTTTTE KKSNNKVVVG GIIIWPAAAA LQQAAATTRR RTTSSSGKKP QIIIVSGAPP

             PPTGRDE-LG RRRAAAARSV VVVVFSFFRR GVVNSSLEVR RRWWIPPGEE EVAFNSSVVL
             LLGESPESLG RRRRRAAKVL LLLLFDFFII PII-PPILLN NNSSVPPDEE ELAVNFFMML
             LLGESPESLG RRRRRAAKVL LLLLFDFFII PII-PPILLN NNSSVPPDEE ELAVNFFMML

             QQQLLHHRLP PADDQ----- IIDCVVVVVV KFFTTVVVVE QQEDNKKGLR RTTLLYYAVV
             QQQLLYYKL- --DDETTTII VVDLAAAALL RVVTTLLLLE YYESNRRGAR RKKLLYYAVV
             QQQLLYYKL- --DDETTTII VVDLAAAALL RVVTTLLLLE YYESNRRGAR RKKLLYYAVV

             FFLAGGGANN NNAAMMMMA- EEECCDVVVV CGGEAAREER HEEEPLSRRW RRRRLTGLAV
             FFLNDDDERR RRVVRRRRVR RRRSSGIIII GPPETTHEER MEEEEKEQQW RRLLMEGFSV
             FFLNDDDERR RRVVRRRRVR RRRSSGIIII GPPETTHEER MEEEEKEQQW RRLLMEGFSV

             VPLLSSLLLQ AAARMMLLGG GFFSSGEGG- SVVEEAADDD DCLTLGHHHG RRSSAWWAAG
             VKLLNNVVVQ AAAKIILLWW WYYNNYSNNY IVVSSKKPPP PFISLANNND LLTSSWW---
             VKLLNNVVVQ AAAKIILLWW WYYNNYSNNY IVVSSKKPPP PFISLANNND LLTSSWW---

             DDGGGGNNNS SSSNNGSSDS SNSSSNGSGD GGGVCLLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ------MDTT TFFPQ----- ---WPPMDPP
t5g66770.1   CTTGGGGNNL MMIAAQQQQE QQQQQQQQHH HHQDDDHQII IFFGNPPLSN NPPWPP-NTT
t5g66770.2   CTTGGGGNNL MMIAAQQQQE QQQQQQQQHH HHQDDDHQII IFFGNPPLSN NPPWPP-NTT

             ASLLPPPAAV V--------- -----ADDDV VY-------- ---------- ----------
             SGLFPPPFFV VTTGGGGPGF FFPLHATTTG GFRLLSGGGG GGGGGGFEDD EWWWMMTLII
             SGLFPPPFFV VTTGGGGPGF FFPLHATTTG GFRLLSGGGG GGGGGGFEDD EWWWMMTLII

             ---------- -------YDP ----GD---- ---------- -------AAA ALLLPAAAAA
             GGDSVVAADD DCCCTWWHDN YVIYGDPTTY PSRRLLVVPS DLLNRVVTSS SPPPLTTLLL
             GGDSVVAADD DCCCTWWHDN YVIYGDPTTY PSRRLLVVPS DLLNRVVTSS SPPPLTTLLL

             AFPPCCCDDA AAAAAAVVAR RREEEEVGR- ---------- ---LLLLMSS GIAGGGLAAA
             LWPPSSSLLS IIIPPPLLST KKDPPPTDED DDDDDFDLLP PPPAAAAYDD -IDSSSEAAK
             LWPPSSSLLS IIIPPPLLST KKDPPPTDED DDDDDFDLLP PPPAAAAYDD -IDSSSEAAK

             ALLASSSHAA AAALLLAAAA AVSSSSAASS GGVVHHHTAL SRL---SSPP VVAPDEFFLL
             KLLLIIIREE ESSVVVSSSS ELGGGGPPTT -EVFYYYTAL SNLNNNSSPP AATSSTLLII
             KLLLIIIREE ESSVVVSSSS ELGGGGPPTT -EVFYYYTAL SNLNNNSSPP AATSSTLLII

             -HHFFYYYEE EPPPAHFTTA NNAAILLEAA FHHGCCHVVV IDFFSSLQGL QQQWWPPALL
             SKTLLNNNDD DPPPAHLTTA NNAAILLEAA TEEKSSHIII VDFFGGIQGI QQQWWPPALL
             SKTLLNNNDD DPPPAHLTTA NNAAILLEAA TEEKSSHIII VDFFGGIQGI QQQWWPPALL

             LLALRRGGPP PFFF-RRIII IGPSD--LLL GLRDDLRSVR VRRSRVNNVR PWLQQIIAAP
             LLATRRGGPP PTTTQRRVVI IPASPPSLLL GNRDDFKVLD LNNDII--LN GSFRRVVDDP
             LLATRRGGPP PTTTQRRVVI IPASPPSLLL GNRDDFKVLD LNNDII--LN GSFRRVVDDP

             EAAVAAAFFS VLHRLLGDPP DDAAIIDDVV VDAAASSSRK FTIIEEQQDD DDNKGFDRRR
             EVVLAAAVVF MLYKLL---- DDTTVVDDAA ARKKKSSSNR VTGGEEYYSS SSNRGFNRRR
             EVVLAAAVVF MLYKLL---- DDTTVVDDAA ARKKKSSSNR VTGGEEYYSS SSNRGFNRRR

             FFEYYYFDDL DDDAGGNAAA EE--LRREEC DICGARRRRR HHEPPLLSSR RRDDRLRALV
             VVNFFFFEEL EEEGSERVVV EERRFRRRRS GLGKGHHHHR MMEEEKKEEQ QRVVLMNAFV
             VVNFFFFEEL EEEGSERVVV EERRFRRRRS GLGKGHHHHR MMEEEKKEEQ QRVVLMNAFV

             LLGGSALLLR QQARMMLLLV VGFFFGE--- HHVEEADCLT TLLLRRLFFA AWWWWAGGGG
             LLSSNAVVVS QQAKIILLLL LWYYYYSLYY SSVSSKPFIS SLLLLLLLLL SWWWW-----
             LLSSNAVVVS QQAKIILLLL LWYYYYSLYY SSVSSKPFIS SLLLLLLLLL SWWWW-----

             NNNNSSSNNS GSSSGGSDSN GGSGKKSARR DSSSVVCCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ------MTFF PFFQQ----- WWPMAAASGP PAA-------
t5g66770.1   MCCDDGGNII AQQVVVKKQQ QQQHHDHIFF GIINNPLLLP WWP-SSLFSP DFQTGGDPPF
t5g66770.2   MCCDDGGNII AQQVVVKKKQ QQQHHDHIFF GIINNPLLLP WWP-SSLFSP PFQTGGDPPF

             -------AAP VVVGY----- ---------- ---------- ---------- -------PPA
             FPNLLHHAAT GGGGFRRDFF GGTTTGGGGF FFSDWTTLSG GGGDSSVDGP PDDCCCTNND
             FPNLLHHAAT GGGGFRRDFF GGTTTGGGGF FFSDWTTLSG GGGDSSVDGP PDDCCCTNND

             ----AD---- ---------- ---------- VDDDDDALLE FFAAPDAAAA VLL-AREEEE
             YVYYPDPFFD DTYYYLSSSV VQDLLLNRVV IDDDDDSPPP PPTLPLPPPP LTTHSKEEPE
             YVYYPDPFFD DTYYYLSSSV VQDLLLNRVV IDDDDDSPPP PPTLPLPPPP LTTHSKEEPE

             AAAIR----- ----LVVVHL LLLLMMSSSC CCIEEAGGHH HALAAASAAA LLADHHHAAV
             NNNSEDDDEE PPPPLLLLKA IIIIYYDDDC CCISSDSSPP PNEAAASKKK LLLQRRRSEL
             NNNSEDDDEE PPPPLLLLKA IIIIYYDDDC CCISSDSSPP PNEAAASKKK LLLQRRRSEL

             ASSGIGGGRV VVTTTTTTAL LSRRLLF--P VPPPPTTDDA EH--YFEEEA PYKKKFFFAH
             DTT--EEERV FFTEEEEEAL LSNRLLSNNP ASSSSSSSSS TELSYLDDDA PYKKKFFFAH
             DTT--EEERV FFTEEEEEAL LSNRLLSNNP ASSSSSSSSS TELSYLDDDA PYKKKFFFAH

             TTNNAIAHHC DHHDFSSSMM QQGGQQQWWA LIILLLARPP GGPRRRIITG GGIGPPSSST
             TTNNAIAEES NKHDFGGGVV QQGGQQQWWA LLLLLLARTT GGKRRRVVSG GGIPAPSSSG
             TTNNAIAEES NKHDFGGGVV QQGGQQQWWA LLLLLLARTT GGKRRRVVSG GGIPAPSSSG

             GRR-LLDDVG GLLRLADAAA RSSVVRVVRF FSGGVAANSS SSLDDVPLLI APGAAAVVFN
             ESSSLLAATG GNNRLRDAAA KVVLLDLLNF FDPPITT-PP PPIHHLGFFV DPDVVVLLVN
             ESSSLLAATG GNNRLRDAAA KVVLLDLLNF FDPPITT-PP PPIHHLGFFV DPDVVVLLVN

             NVLQQHRLLP DDQAPPP-ID DVLDDCSVRR PPPPKTEEEA AHKTTGGFLL LLDDDRRTAA
             NMLQQYKLL- DDETPPPIVD DALRRLSLNN PPPPRTEEEV VLRVVGGFAA AANNNRRKAA
             NMLQQYKLL- DDETPPPIVD DALRRLSLNN PPPPRTEEEV VLRVVGGFAA AANNNRRKAA

             YYYYYSAADL DASGGAGNNM ME-AYYLLRE CIVVGGAA-E EEEREEPLWD RRLTTRRAAA
             FFFYYSAAEL EGRDDEERRR RERELLFFRR SLIIPKGGIE EEEREEEKWV LLMEENNAAA
             FFFYYSAAEL EGRDDEERRR RERELLFFRR SLIIPKGGIE EEEREEEKWV LLMEENNAAA

             ALLLSSAVLL LGSSNNNALR RRRLVGGLLF GGEEG--HHS SVEEEEEAAD GGLLWWRRPL
             AFFFEESVLL LSNNYYYAVS SKKLLWWNNY YYSSNLYSSI IVEESSSKKP GGILWWLLPL
             AFFFEESVLL LSNNYYYAVS SKKLLWWNNY YYSSNLYSSI IVEESSSKKP GGILWWLLPL

             FFFASSEAAA ADGGDNNNSV SGGSDNNSNK SSRSSVVCC
             LLLLSSR--- ---------- ---------- ---------
             LLLLSSR--- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------M TTPPQ----- ----WMMMPA
t5g66770.1   MMYCTTDDDN LLMMMAIIIK QQQQEEQQQQ QHHHQQDDDH IIGGNPLSLL LPPPW---TS
t5g66770.2   MMYCTTDDDN LLMMMAIIIK QQQQEEQQQQ QHHHQQDDDH IIGGNPLSLL LPPPW---TS

             AAAGGGLLDA AFLPPPPPAV ---------- ---------- ADGVGY---- ----------
             SLLGGGLLSG GAFPPPPPQV TGGGDSNPGG FPFNNHHHHH ATGGGFRRLL LSDGGGGGGE
             SLLGGGLLSG GAFPPPPPQV TGGGDSNPGG FPFNNHHHHH ATGGGFRRLL LSDGGGGGGE

             ---------- ---------- ---YDPA--- -GGGGADDD- ---------- ----AAAAAA
             EDEWWMEETT LSSGSDDGGP DDTHDNDYVI YGGGGPDDDP PFFDDDTPPS SSNVTTTTSS
             EDEWWMEETT LSSGSDDGGP DDTHDNDYVI YGGGGPDDDP PFFDDDTPPS SSNVTTTTSS

             ALLPFAFCAA PPAAAAL--- AMRRREEEVA AGGGG----- ----VVMSAA IEAGGDHHAA
             SPPLPLWSSS PPSIPPTHEE SPTKKEDETN NDDDDFFFFD DEEPLLYDAR ISDSSDPPNA
             SPPLPLWSSS PPSIPPTHEE SPTKKEDETN NDDDDFFFFD DEEPLLYDAR ISDSSDPPNA

             ASSLLSSHLA AVVSSAAASI IRRVATALRL LFPPSSPPVP PPTTTAEL-- --YYFYYEEP
             ASSLLIIRVS SLLGGPPPT- -RRVATALRL LSPPSSPPAS SSSSSSTILL LSYYLNNDDP
             ASSLLIIRVS SLLGGPPPT- -RRVATALRL LSPPSSPPAS SSSSSSTILL LSYYLNNDDP

             PYKFFAAHHF FFFNQAEFHH GGGCHHIDDF LMMMLQWPLL LIIAAAAALR PPPGGPPFFI
             PYKFFAAHHL LLLNQAETEE KKKSKHVDDF IVVVIQWPLL LLLAAAAATR TTTSGKKTTV
             PYKFFAAHHL LLLNQAETEE KKKSKHVDDF IVVVIQWPLL LLLAAAAATR TTTSGKKTTV

             ITGGGPP--- RRGGRLLLAD DAAAARVVRV RFFSFFGGGV ANSPPWWMML IPGEAAAVVA
             VSGPPPLPSS IIGGRLLLRD DAAAAKLLDL NFFDFFPPPI T-PGGSSSSF VPDEVVVLLA
             VSGPPPLPSS IIGGRLLLRD DAAAAKLLDL NFFDFFPPPI T-PGGSSSSF VPDEVVVLLA

             AAFNSSLLLR DDDPDDDQAP P---ILCVVR KITTVVEQEE ANKTTGGFLD DRFFFTTTEE
             AAVNFFLLLK ----DDDETP PTTTVLLALN RVTTLLEYEE VNRVVGGFAN NRVVVKKKNN
             AAVNFFLLLK ----DDDETP PTTTVLLALN RVTTLLEYEE VNRVVGGFAN NRVVVKKKNN

             LLYAFFDDDS LLDAASSAGA AGGGNMMA-- AAYLQQREEE ICCCVCAAAR RHEPSRRWWW
             LLYAFFEEES LLEPPLLGDE EEEERRRVRR EELFGGRRRR ISSSIGGGGR RMEEEQQWWW
             LLYAFFEEES LLEPPLLGDE EEEERRRVRR EELFGGRRRR ISSSIGGGGR RMEEEQQWWW

             RRRLLTTTPP GGGAAALRMM LLFGGEGGGG G-HSEEEAGC LGWRRSAAAS SAWWEGDGGD
             RLLMMEEEKK SSSAAAVKII NNYYYSNNNN NYSIEESKGF IAWLLTLLLS SSWWR-----
             RLLMMEEEKK SSSAAAVKII NNYYYSNNNN NYSIEESKGF IAWLLTLLLS SSWWR-----

             NNNNSNNNVV SGSSGDSNNS SGGGSSNSSS AARDGCLLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----MDFFPF ---------- WWPPMPAAGG
t5g66770.1   MMAYCTDSGG LLMIQQQVVV QQKQQQQQQQ HQDDHQFFGI PPLLSSLNPP WWPP-TSLGG
t5g66770.2   MMAYCTDSGG LLMIQQQVVV KKKQQQQQQQ HQDDHQFFGI PPLLSSLNPP WWPP-TSLGG

             LDLPPPPAAV ---------- ----APDDDV G--------- ---------- ----------
             LSFPDPPFQV GGDDPFNLDD HHHHATTTTG GSSDDFFGGG GGGGGTGGFF ESDEETLISS
             LSFPPPPFQV GGDDPFNLDD HHHHATTTTG GSSDDFFGGG GGGGGTGGFF ESDEETLISS

             ---------- -------YYY YPAA--GGDD DD-------- -------VVD DDALLFAAPC
             GGDSSSSVDD CCDDTWWHHH HPDDVYGGDD DDPFYPPPSS RLVPPLNIID DDSPPPTLPS
             GGDSSSSVDD CCDDTWWHHH HPDDVYGGDD DDPFYPPPSS RLVPPLNIID DDSPPPTLPS

             PAALL-MMMM RRREEEVAAI I--------- -HLLSAAAAA AEEAAGGGHH HASSSAALAA
             PSPTTHPPPP KKKEDPTNNS SDDDDDLLEP PKAIDAARRR RSSDDSSSPP PASSSKKLLL
             PSPTTHPPPP KKKEDPTNNS SDDDDDLLEP PKAIDAARRR RSSDDSSSPP PASSSKKLLL

             DSHHAAAALL AAAASAGIIV AAHHFTTAAL LSRRLF---A APTAEAFLL- YYHHHFYYEE
             QIRRESSSVV SSSEGD---V AAYYFTTAAL LSRRLSNNNT TSSSTDLIIS YYKTTLNNDD
             QIRRESSSVV SSSEGD---V AAYYFTTAAL LSRRLSNNNT TSSSTDLIIS YYKTTLNNDD

             AACCCCYYYY LKKKFFAHTN NNNAAIEDHH QQGGLLQQWW WAAAALLLLQ QQQQAALLLL
             AACCCCYYYY SKKKFFAHTN NNNAAIENKK QQGGIIQQWW WAAAALLLLQ QQQQAALLLL
             AACCCCYYYY SKKKFFAHTN NNNAAIENKK QQGGIIQQWW WAAAALLLLQ QQQQAALLLL

             AAGGPF-LRR RRITTGIGPS SSSPDDERDD DGGRLLAAAR RVRRFFSSFG NSLDEERWWQ
             AASGPTQIRR RRVSSGIPAS SSSLPPEIAA AGGRFFAAAD DLNNFFDDFP -PIHLLNSSR
             AASGPTQIRR RRVSSGIPAS SSSLPPEIAA AGGRFFAAAD DLNNFFDDFP -PIHLLNSSR

             PEAAVAFNSV HHRRGDDDPA AAADQP-IID VDCCCPPPKK IIIIFFTIIE QDDHHHHNNK
             PEVVLAVNFM YYKK------ ---DEPIVVD ARLLLPPPRR VVVVVVTGGE YSSLLLLNNR
             PEVVLAVNFM YYKK------ ---DEPIVVD ARLLLPPPRR VVVVVVTGGE YSSLLLLNNR

             TGFLDDFTTT EAFYYSFSAA ASGGAANNAA AEAYQREIII IVGEEGGA-- RRHPLLSRDR
             VGFANNVKKK NAQYYSFSPN GRSSEERRVV VEELGRRIIL LIPEEKKGII RRMEKKEQVL
             VGFANNVKKK NAQYYSFSPN GRSSEERRVV VEELGRRIIL LIPEEKKGII RRMEKKEQVL

             RRTRGLSSVL LLSSSNNAAL AMVVGSGEGG --HSSVDDGG GHHGRPPLSA SWAGGGGGDD
             LLENGFEEVL LLNNNYYAAV AILLWNYSNN YYSIIVPPGG GNNDLPPLTL SW--------
             LLENGFEEVL LLNNNYYAAV AILLWNYSNN YYSIIVPPGG GNNDLPPLTL SW--------

             NNNNSSSNNV GGGGSNNSGG SSNGGKKKSG GAARRRDSV
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- DTFFPF--WW MDPSSSSSLD ALPPPPAA--
t5g66770.1   MAAYCTDDGG NLLLQVVVVK QQKQEQQQQQ QIFFGILNWW -NTGGFFFLS GFPDPPFQTG
t5g66770.2   MAAYCTDDGG NLLLQVVVVK KKKQEQQQQQ QIFFGILNWW -NTGGFFFLS GFPPPPFQTG

             ---------- ---------A ADDVGGGYY- ---------- ---------- ----------
             GGGGDDPPPP PPNDDHHHHA ATTGGGGFFD FFGGGEFSSE WMMMTTLLII SSDSVVVAAA
             GGGGDDPPPP PPNDDHHHHA ATTGGGGFFD FFGGGEFSSE WMMMTTLLII SSDSVVVAAA

             -----DDPPP -----GGA-- --------VD AAAAALLPEE FAAFPAPPPA AAAAA-MMEV
             ADDGDDDNPP VIIYYGGPTT SSSVVVVQID TTSSSPPLPP PTTWPSPPPS SIIIPHPPET
             ADDGDDDNPP VIIYYGGPTT SSSVVVVQID TTSSSPPLPP PTTWPSPPPS SIIIPHPPET

             AAGIIRR--- --------LH LMMMSCGGAI IEGHAALLSS SAQLLLLDSH HAAAAASSGG
             NNDSSEEDDD DFLLEEEPLK AYYYDC--RI ISSPNNEESS SKTLLLLQIR RSEDDPTT--
             NNDSSEEDDD DFLLEEEPLK AYYYDC--RI ISSPNNEESS SKTLLLLQIR RSEDDPTT--

             IIIIGVAHFT TTALSSSRRR RLLFPSPPAA PTTDEEAAL- ----YHHHFF YEECCPPLLK
             ----EVAYFT EEALSSSNNR RLLSPSPPTT SSSSTTDDIL LLSSYKKTLL NDDCCPPSSK
             ----EVAYFT EEALSSSNNR RLLSPSPPTT SSSSTTDDIL LLSSYKKTLL NDDCCPPSSK

             KKFHHHFTAA AANNQALEFH HHGCDHVHVI DFFFFSGQWW PQAALLLLAL LLPPPPPPF-
             KKFHHHLTAA AANNQALETE EEKSNKIHIV DFFFFGGQWW PQAALLLLAT TTTKKKPPTQ
             KKFHHHLTAA AANNQALETE EEKSNKIHIV DFFFFGGQWW PQAALLLLAT TTTKKKPPTQ

             LRRRITGGGS SPEELDDVLA DAARRSVVVF FFFFGVAAAA ANNDDEEEEE VWWWWMMLLI
             IRRRVSGGGS SLEELAATNR DAAKKVLLLF FFFFPILLTT T--HHLLLLL LSSSSSSFFV
             IRRRVSGGGS SLEELAATNR DAAKKVLLLF FFFFPILLTT T--HHLLLLL LSSSSSSFFV

             APAVVFNVHR LGDPAAADQA AAP----IID VDASSRPIIF TTVVEEQEAA DGFLDRREAA
             DPVLLVNMYK L------DET TTPTTTTVVD ARKSSNPVVV TTLLEEYEVV SGFANRRNAA
             DPVLLVNMYK L------DET TTPTTTTVVD ARKSSNPVVV TTLLEEYEVV SGFANRRNAA

             AFYYYYSSDD DDSSLAASAG AAGMEE-ALL LEDVVCGGGG EEAAAAAA-E RRLLRWDDLR
             AQFFYYSSEE EESSLPPLGS EEEREEREFF FRGIIGPPPP EETTTGGGIE RRKKQWVVMN
             AQFFYYSSEE EESSLPPLGS EEEREEREFF FRGIIGPPPP EETTTGGGIE RRKKQWVVMN

             AAGGLLAAVL LNNNALLRQQ QARRLLVVVL FFGG----HV VVEDDGCTLL HRFFAASAEE
             AAGGFFSSVL LYYYAVVSQQ QAKKLLLLLN YYYNLLLLSV VVSPPGFSLL NLLLLLSSRR
             AAGGFFSSVL LYYYAVVSQQ QAKKLLLLLN YYYNLLLLSV VVSPPGFSLL NLLLLLSSRR

             AAAGGGGGGG DNNSNNSNSS SGGNSSGKSS SRDDGSCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---MTTTPQ- --WWPMDDPP AAAASSDALL
t5g66770.1   ATDSNLMAIA AQQIIIKQKK QQQQQQQQQQ HHQHIIIGNL LLWWP-NNTT SSLLGGSGFF
t5g66770.2   ATDSNLMAIA AQQIIIKKKK QQQQQQQQQQ HHQHIIIGNL LLWWP-NNTT SSLLGGSGFF

             LLAAAVV--- ---------- ---------- -DGVGY---- ---------- ----------
             FFFQQVVGGG GGGDDDDSNN PGPPFNNLHH HTGGGFRRSS SDFFGGGGGE EEEESEEISG
             FFFQQVVGGG GGGDDDDSNN PGPPFNNLHH HTGGGFRRSS SDFFGGGGGE EEEESEEISG

             ---------- --PPAA---- ---------- --------VV DDAAAALLPP EEAAFPPPPP
             GGVAAAADGG DWNPDDYFTT YYPPRLVSSL LNNNRRVVII DDTSSSPPLL PPPTWPPPPP
             GGVAAAADGG DWNPDDYFTT YYPPRLVSSL LNNNRRVVII DDTSSSPPLL PPPTWPPPPP

             CAPPDAAAAL LL--AARREE EEEEEAR--- -------VHH LMSSCAIGDD HHALLAAASS
             SSPPLSIPPT TTHESSTKEE EEPPENEDDD DDLEEPPLKK AYDDCRISDD PPNEEAAASS
             SSPPLSIPPT TTHESSTKEE EEPPENEDDD DDLEEPPLKK AYDDCRISDD PPNEEAAASS

             SSQQLLADAA AAAVVVSAAS GGIRVAVHHH FTTALSPP-S SPVVVAAAAT AHAFLL-YYY
             SSTTLLLQSS EEELLLGDDT ---RVAFYYY FTEALSPPNS SPAAATTTTS SEDLIILYYY
             SSTTLLLQSS EEELLLGDDT ---RVAFYYY FTEALSPPNS SPAAATTTTS SEDLIILYYY

             HHFYEEEAAC PYYKFFAANN NQQILLLFHC CDHVVVIIIL MGGLLWWPLL IIAAAALLGG
             TTLNDDDAAC PYYKFFAANN NQQILLLTES SNHIIIVVVI VGGIIWWPLL LLAAAALLSS
             TTLNDDDAAC PYYKFFAANN NQQILLLTES SNHIIIVVVI VGGIIWWPLL LLAAAALLSS

             GGPP--IITG PTRDE----- -LDVRLDLVV RRVAAAAANN SLDDEVRPWW LQIAPAFFVL
             SSKPQQVVSG AGSPEPPPSS SLATRLDFLL IIILLTTT-- PIHHLLNGSS FRVDPAVVML
             SSKPQQVVSG AGSPEPPPSS SLATRLDFLL IIILLTTT-- PIHHLLNGSS FRVDPAVVML

             HHLLLLLGDP PPAADAAP-- IDDAVDDAAR RRPKKIFTII EAHKFFFFFL LLEEAFYYYY
             YYLLLLL--- ----DTTPTT VDDTARRKKN NNPRRVVTGG EVLRFFFFFA AANNAQFFFY
             YYLLLLL--- ----DTTPTT VDDTARRKKN NNPRRVVTGG EVLRFFFFFA AANNAQFFFY

             SSAVDDDDSL ASSGGGGAAG NAAEAYYLRE IIIIDIIVCG EGGGA-RERP LSRLTTAAGL
             SSAVEEEESL GRRDDDSEEE RVVEELLFRR IIIIGLLIGP EKKKGIHERE KELMEEAAGF
             SSAVEEEESL GRRDDDSEEE RVVEELLFRR IIIIGLLIGP EKKKGIHERE KELMEEAAGF

             SAVLLSSNNN ALQMMMMMMM GSGEEG---- EEEEEECLTL GGPPLLFSAA AAAGGDGGGN
             ESVLLNNYYY AVQIIIIIII WNYSSNLLYY EEEESSFISL AAPPLLLSS- ----------
             ESVLLNNYYY AVQIIIIIII WNYSSNLLYY EEEESSFISL AAPPLLLSS- ----------

             NNNSSNNSSS GGSSSDSNNN NNNSSKKKSS SAADGSCLL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- --------MM TPFFQ----M MMDDASSSLD
t5g66770.1   AYYTDNAIII AQQQVVVIII KQQQEQQQQQ QQHHHHQDHH IGIINPSLP- --NNSGGGLS
t5g66770.2   AYYTDNAIII AQQQVVVIII KQQQEQQQQQ QQHHHHQDHH IGIINPSLP- --NNSGGGLS

             DAAAGGFLPA V--------- -------PDD GGVGY----- ---------- ----------
             SGGGSSAFDQ VGGGNGGPPN NLLDDHHTTT GGGGFRLSDG TGGGGGFEEM TTLISGGGDD
             SGGGSSAFPQ VGGGNGGPPN NLLDDHHTTT GGGGFRLSDG TGGGGGFEEM TTLISGGGDD

             ---------Y DPAA--GGGA D--------- ------VDDA AEEFAPCCCP DAAL--ARRE
             DAADDDDCDH DPDDIYGGGP DPPPFFDYPP VVPLNVIDDS SPPPLPSSSP LSPTHESTTE
             DAADDDDCDH DPDDIYGGGP DPPPFFDYPP VVPLNVIDDS SPPPLPSSSP LSPTHESTTE

             EEEVAAAGII R--------- ---LVVHLSS SSSSCCGAEA AGDDAAALAS SASHLLLAAA
             DPETNNNDSS EDDDDDFDLL EEPLLLKIDD DDDDCC-RSD DSDDNNNEAS SLIRVVVSSE
             DPETNNNDSS EDDDDDFDLL EEPLLLKIDD DDDDCC-RSD DSDDNNNEAS SLIRVVVSSE

             ASAAASSSGG VAAAVFTTLL LSRPP-SPPA PPDDAAAEH- YHHFEEEAKF TAIEAFHCDV
             EGDPPTTTEE VAAAFFTTLL LSRPPNSPPT SSSSSSSTEL YKTLDDDAKL TAIEATESNI
             EGDPPTTTEE VAAAFFTTLL LSRPPNSPPT SSSSSSSTEL YKTLDDDAKL TAIEATESNI

             HVDSLMMQQG GLQQWIIIQQ LLRPPPFFF- --ITGPSSGR D-LRDDVGGL RLAADDLRSV
             HIDGIVVQQG GIQQWLLLQQ LTRKPPTTTQ QQVSGASSES PSLIAATGGN RLRRDDFKVL
             HIDGIVVQQG GIQQWLLLQQ LTRKPPTTTQ QQVSGASSES PSLIAATGGN RLRRDDFKVL

             RFFFFRSVPW WMLQQIIAAP GAAVVFFNNV VLLHHLLGDP DDDQQAAP-- IIIVLLCCCV
             NFFFFIPLGS SSFRRVVDDP DVVLLVVNNM MLLYYLL--- DDDEETTPTI VVVALLLLLA
             NFFFFIPLGS SSFRRVVDDP DVVLLVVNNM MLLYYLL--- DDDEETTPTI VVVALLLLLA

             ASVRRRKIFV VIEEEEAADH HHHTTTGFLL FFFELFFFFD DSLLDAAAAA ASGGGGAGNN
             KSLNNNRVVL LGEEEEVVSL LLLVVVGFAA VVVNLQQFFE ESLLEPNGGG GRSSSSEERR
             KSLNNNRVVL LGEEEEVVSL LLLVVVGFAA VVVNLQQFFE ESLLEPNGGG GRSSSSEERR

             AE-YLLLRRE IICCDVVCEE GGGGGA--RE EEERHHPPSS DDRRLLLTTA AGGLSAVGSN
             VERLFFFRRR IISSGIIGEE KKKKKTIIRE EEERMMEEEE VVLLMMMEEA AGGFESVSNY
             VERLFFFRRR IISSGIIGEE KKKKKTIIRE EEERMMEEEE VVLLMMMEEA AGGFESVSNY

             NNNALLRRRR RMMLVGLFSS EGGG-HHHVE EDGGGCTLLL GGHRLSSSSW WWEEDGGGGD
             YYYAVVSSSK KIILLWNYNN SNNNYSSSVE EPGGGFSLLL AANLLTTSSW WWRR------
             YYYAVVSSSK KIILLWNYNN SNNNYSSSVE EPGGGFSLLL AANLLTTSSW WWRR------

             NNNNSSNNNV SSSGSSSNNG GGSSSGGGAA RRRDSSSCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ----DTFPPQ Q--------W WWWWPMMMMA
t5g66770.1   MMMCCTSGGN LAAQQQQQVV KKQQEQQQQH QQDDQIFGGN NPPPLLSLNW WWWWP----S
t5g66770.2   MMMCCTSGGN LAAQQQQQVV KKQQEQQQQH QQDDQIFGGN NPPPLLSLNW WWWWP----S

             ASSSLAAGGG FLLLPPPPAA AV-------- ---------- --------AA PPGGGGGGYY
             LGGFLGGSSS AFFFDPPPQQ QVTGGSSSNN DPPPGFFFFF PNNLDHHHAA TTGGGGGGFF
             LGGFLGGSSS AFFFPPPPQQ QVTGGSSSNN DPPPGFFFFF PNNLDHHHAA TTGGGGGGFF

             ---------- ---------- ------YYDP PAAA----AA A--------- ------VDAA
             FFGTGGGGGE EETTLGSVDG GCCTWWHHDP PDDDYYVYPP PPFTTYYPPL QQPSDRIDTS
             FFGTGGGGGE EETTLGSVDG GCCTWWHHDP PDDDYYVYPP PPFTTYYPPL QQPSDRIDTS

             ALPEEAAPPP CPDDDAAAAA VVLL--AAMR RREEEVGI-- -----LLLLM SCAGGGIAAA
             SPLPPPTPPP SPLLLIIIIP LLTTHESSPT TKEEETDSDF LLPPPLLAIY DCA---IDDD
             SPLPPPTPPP SPLLLIIIIP LLTTHESSPT TKEEETDSDF LLPPPLLAIY DCA---IDDD

             GDDAASSAAA AQQAADDDAV VVSSSASGII GGRAAVVHHH FFTTALSRRR RRLL-SPPPT
             SDDNNSSKKK KTTLLQQQEL LLGGGDT--- EERAAFFYYY FFTEALSNNN NRLLNSSSSS
             SDDNNSSKKK KTTLLQQQEL LLGGGDT--- EERAAFFYYY FFTEALSNNN NRLLNSSSSS

             TDDAEEHH-- HHYYYEECPY LKHHFNQQAI LLLEHHGDHH HVVHVDFFSL MQQGGLLWPI
             SSSSTTEESS TTNNNDDCPY SKHHLNQQAI LLLEEEKNKK KIIHIDFFGI VQQGGIIWPL
             SSSSTTEESS TTNNNDDCPY SKHHLNQQAI LLLEEEKNKK KIIHIDFFGI VQQGGIIWPL

             IQQAAAALRP F---LTPPSP PGDEEE-LRD VGGRAADDDL ARRRVRVSSS FRGGASLLLD
             LQQAAAATRK TQQQISAPSL LEPEEEPLIA TGGRRRDDDF AKKKLDLDDD FIPPLPIIIH
             LQQAAAATRK TQQQISAPSL LEPEEEPLIA TGGRRRDDDF AKKKLDLDDD FIPPLPIIIH

             EEVVVRPWWM MLLPEVANSV VLQQQLRRLG GDAQPP-DDA VDDCCCAKEE EEADHNKKKT
             LLLLLNGSSS SFFPELANFM MLQQQLKKL- ---EPPIDDT ARRLLLKREE EEVSLNRRRV
             LLLLLNGSSS SFFPELANFM MLQQQLKKL- ---EPPIDDT ARRLLLKREE EEVSLNRRRV

             FFFLRRFFTT EALFYYSAVS DAASSSGGGG GGGAAAAGAM -AAYLLECDI CCGEG-RRRR
             FFFARRVVKK NALQFFSAVS EPNLLRDDSS SSSEEEEEVR REELFFRSGL GGPEKIHHHH
             FFFARRVVKK NALQFFSAVS EPNLLRDDSS SSSEEEEEVR REELFFRSGL GGPEKIHHHH

             PPRRLRAAAG LSSVVPGGGS SSNALRQQRR VVLGGEG-SV VEALLTLWWG PPLFASSSWW
             EEQQMNAAAG FEEVVKSSSN NNYAVSQQKK LLNYYSNYIV VSKIISLWWD PPLLLSSSWW
             EEQQMNAAAG FEEVVKSSSN NNYAVSQQKK LLNYYSNYIV VSKIISLWWD PPLLLSSSWW

             DGGGGNNNNN NSNNSNNVSS SGSNSGSNSS GARRDGSSC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---------- -------FPP PFQQ------
t5g66770.1   MAAYMMCTTD GGNNLLIAAQ QQQVIIQKQQ QQEEQQQQHH HHHHHQQFGG GINNPPSLLN
t5g66770.2   MAAYMMCTTD GGNNLLIAAQ QQQVIIKKQQ QQEEQQQQHH HHHHHQQFGG GINNPPSLLN

             -MDDDDAAAA ASLFFPPPAV VV-------- ---------- -----PDDVV YY--------
             N-NNNNSSLL LFLAAPDPQV VVTTGGGDDN NPFFPNNDDH HHHHHTTTGG FFFFGGGGTT
             N-NNNNSSLL LFLAAPPPQV VVTTGGGDDN NPFFPNNDDH HHHHHTTTGG FFFFGGGGTT

             ---------- ---------- ---------- ---YPPPPA- ---D------ -------AAA
             TGGEESEEME EETTSSSGDD DSAADDGDDD DCWHNPPPDY YVIDPFFSSV VSSDDLLTTT
             TGGEESEEME EETTSSSGDD DSAADDGDDD DCWHNPPPDY YVIDPFFSSV VSSDDLLTTT

             AAALPPEAAA AAFFCCCAPD AAVVL----- RRRREEEVVA AGGR------ -----LLLLL
             SSSPLLPPTT LLWWSSSSPL IPLLTEEEEE TTKKEDDTTN NDDEDDDDDD FFFEPLLLII
             SSSPLLPPTT LLWWSSSSPL IPLLTEEEEE TTKKEDDTTN NDDEDDDDDD FFFEPLLLII

             LMCCGGIGHA AASSAAAQDS HHHAAAALLA GIGVTTLLSR RRPPP-PAPP PTTTTDDAAA
             IYCC--ISPN AASSKKKTQI RRREEESVVP --EFTTLLSR RRPPPNPTSS SSSSSSSSDD
             IYCC--ISPN AASSKKKTQI RRREEESVVP --EFTTLLSR RRPPPNPTSS SSSSSSSSDD

             FFLL--HHHH HFFYCLLKAH HHFNQALLEA FHVVVDSLMQ QQGGLQQWPP PLLIIQRPGG
             LLIILSKKKK TLLNCSSKAH HHLNQALLEA THIIIDGIVQ QQGGIQQWPP PLLLLQRTGG
             LLIILSKKKK TLLNCSSKAH HHLNQALLEA THIIIDGIVQ QQGGIQQWPP PLLLLQRTGG

             GGPF-LRITP PPPSSPTGGG RDEE---LLD VVGLLDDAAR RSSGVVVAAN NSSSLLLRRP
             GGKTQIRVSA AAPSSLGEEE SPEEPSSLLA TTGNLDDAAK KVDPIIITT- -PPPIIINNG
             GGKTQIRVSA AAPSSLGEEE SPEEPSSLLA TTGNLDDAAK KVDPIIITT- -PPPIIINNG

             WWQAAGGGEE AAVAAFSLQH LDDAAP-AAL LVVASVRPIF TTVIIEQQEE HHHNKKGLLD
             SSRDDDDDEE VVLAAVFLQY LDDTTPITTL LAAKSLNPVV TTLGGEYYEE LLLNRRGAAN
             SSRDDDDDEE VVLAAVFLQY LDDTTPITTL LAAKSLNPVV TTLGGEYYEE LLLNRRGAAN

             FFEEEAAYYS AAVVFAASGG NNME--AALQ QREEIICCDI VVCGGGEAAR PSSSRDRLLR
             VVNNNAAFYS AAVVFPNRSE RRRERREEFG GRRRIISSGL IIGPPPETGH EEEEQVLMMN
             VVNNNAAFYS AAVVFPNRSE RRRERREEFG GRRRIISSGL IIGPPPETGH EEEEQVLMMN

             GLPLLGGGSN ALRQQQQQAA RRRMLLVLSG G-VEACLLTL LLGGHHGPSA ASAAWWEEAA
             GFKLLSSSNY AVSQQQQQAA KKKILLLNNY YYVSKFIISL LLAANNDPTL LSSSWWRR--
             GFKLLSSSNY AVSQQQQQAA KKKILLLNNY YYVSKFIISL LLAANNDPTL LSSSWWRR--

             AGGGGGNNNN NNNNSNGSDD DSSSNNNSSK KGGRGSSCC
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
    3   639
01g45860.1   ---------- ---------- ---------- ---FPFF--- ----PMDASS SSGGDAGGFL
t5g66770.1   MAAYTDGGNN AAQKQQKKQE EEQQQQQQQH QDDFGIIPLS LNPPP-NLGF FFGGSGSSAF
t5g66770.2   MAAYTDGGNN AAQKKKKKQE EEQQQQQQQH QDDFGIIPLS LNPPP-NLGF FFGGSGSSAF

             LPPAVVVV-- ---------- ---PPDDGY- ---------- ---------- ----------
             FDPQVVVVGG GGSNNPPFFF HHHTTTTGFD DFGGGTGGGG EFFFEESDWW MMEEETTTSS
             FPPQVVVVGG GGSNNPPFFF HHHTTTTGFD DFGGGTGGGG EFFFEESDWW MMEEETTTSS

             ---------- -----PPAA- ----GA---- ---------- ----VDAAAL LLLPFFAAAA
             GGGGDDSDCC CCDWWNNDDI IYYYGPPPFT YSSLSQSSDL LVVVIDTSSP PPPLPPPPTT
             GGGGDDSDCC CCDWWNNDDI IYYYGPPPFT YSSLSQSSDL LVVVIDTSSP PPPLPPPPTT

             APPPPCAAAA AAAAAA-AME EEEEVVVAAA GI-------L LVHHMMMMCA AIIIEAGGGA
             LPPPPSSSSS PPPPPPHSPE DEEETTTNNN DSDDFFDLPL LLKKYYYYCA RIIISDSSSN
             LPPPPSSSSS PPPPPPHSPE DEEETTTNNN DSDDFFDLPL LLKKYYYYCA RIIISDSSSN

             AASAAQLASH HAAAAAAVVV AAASGIIRVV VAHFTTLRRR LFPPPPP-SV VAAPTTTAAA
             NASKKTLLIR REESSEELLL DDPT---RVV VAYFTTLRRR LSPPPPPNSA ATTSSSSSDD
             NASKKTLLIR REESSEELLL DDPT---RVV VAYFTTLRRR LSPPPPPNSA ATTSSSSSDD

             -HHYYEEEEC CLLFFAAHHF TAAADDHVVV IFFSSLMQQG GLWAIQAAAL ALRRPPGGPP
             SKTNNDDDDC CSSFFAAHHL TAAANNKIII VFFGGIVQQG GIWALQAAAL ATRRTTSGPP
             SKTNNDDDDC CSSFFAAHHL TAAANNKIII VFFGGIVQQG GIWALQAAAL ATRRTTSGPP

             FFFF-RRIPP PSSSGEEE-- -LRRDVLAAA AAARSVRVVR FFSFRRGGVV ANNSDDEVVW
             TTTTQRRIAP PSSSEEEEPP SLIIATLRAA AAAKVLDLLN FFDFIIPPII L--PHHLLLS
             TTTTQRRIAP PSSSEEEEPP SLIIATLRAA AAAKVLDLLN FFDFIIPPII L--PHHLLLS

             LIEAVAANVV LQQLHHHLLP PDDQQQA--- VVLDDDCVSV VRRKIIFIEE EDDDHNKKTT
             FVEVLAANMM LQQLYYYLL- -DDEEETTII AALRRRLASL LNNRVVVGEE ESSSLNRRVV
             FVEVLAANMM LQQLYYYLL- -DDEEETTII AALRRRLASL LNNRVVVGEE ESSSLNRRVV

             LAALLFSSVF SSSLASAGGA GNNAEAYQII DIVCGGARRE SRRRRWDDDR RLTAGGGLSA
             AAALLQSSVF SSSLPLGDSE ERRVEELGII GLIGPKGHRE EQQQQWVVVL LMEAGGGFES
             AAALLQSSVF SSSLPLGDSE ERRVEELGII GLIGPKGHRE EQQQQWVVVL LMEAGGGFES

             APPLAARAAA RLVVGFFSSS SSGVVEEEAD GCTTLWGRPF FSAASSAWEA GGGNNNNNSS
             SKKLAASAAA KLLLWYYNNN NNNVVEESKP GFSSLWDLPL LTLLSSSWR- ----------
             SKKLAASAAA KLLLWYYNNN NNNVVEESKP GFSSLWDLPL LTLLSSSWR- ----------

             SSSNNVVVSG SSSSNSSGSS NKSSSAARRR RGSSVVVCL
             ---------- ---------- ---------- ---------
             ---------- ---------- ---------- ---------
From jason.stajich at duke.edu  Thu Aug  4 16:16:12 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Aug  4 16:07:55 2005
Subject: [Bioperl-l] dividing seqboot outfiles
In-Reply-To: <42F23450.70209@cirad.fr>
References: <42F23450.70209@cirad.fr>
Message-ID: <CEE73756-4755-4DC3-B76B-1635F655E3C4@duke.edu>

How about Bio::AlignIO?

my $aln = Bio::AlignIO (-format => 'phylip);
On Aug 4, 2005, at 11:29 AM, matthieu wrote:

> Hello,
> I'm trying to divide seqboot outfiles containing 100  
> multialignments in , for example, 10 files of 10 multialignments. I  
> did'nt find any parser for this.
> I'm thinking about identifying the first charaters of the seqboot  
> outfiles (ex :" 3   639 " in my example) to recognize each  
> multialignment "blocks" but I didn't manage to do this...
> In join my frist code and an example of seqboot outfile.
> Thanks
>
>
> Matthieu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From james.wasmuth at ed.ac.uk  Fri Aug  5 06:10:48 2005
From: james.wasmuth at ed.ac.uk (James Wasmuth)
Date: Fri Aug  5 06:19:32 2005
Subject: [Bioperl-l] bl2seq
In-Reply-To: <655276655c36.655c36655276@emich.edu>
References: <655276655c36.655c36655276@emich.edu>
Message-ID: <42F33B28.4020101@ed.ac.uk>

Hi Usha,

what happens if you type 'bl2seq' on the command line?


Usha Rani Reddi wrote:

>Hi,
>I tried to run local bl2seq by installing Bioperl on Linux machine. 
>When I tried to align 2 sequences using bl2seq I got an error message 
>that says "could not find path to bl2seq". After getting the error 
>message  I did set the environmental variables(path) and tried again I 
>got the same error message. Please help me with this.
>Thanks
>Usha
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
"You have made your way from worm to man,
    and much in you is still worm."
	Friedrich Nietzsche, Thus Spoke Zarathustra


Blaxter Nematode Genomics Group   |
Institute of Evolutionary Biology |
Ashworth Laboratories, KB         | tel: +44 131 650 7403
University of Edinburgh           | web: www.nematodes.org/~james
Edinburgh                         |
EH9 3JT                           |
UK                                |	
 

From hlapp at gnf.org  Fri Aug  5 11:18:27 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Aug  5 11:10:07 2005
Subject: [Bioperl-l] Re: all tests pass [was Re: Fixing bioperl] [was Re:
	Analysis features]
In-Reply-To: <2bf4b9070ab5bb61b34e15d3ae611044@duke.edu>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<1122650232.10455.31.camel@localhost.localdomain>
	<51a02b5bd508f35301ee3c847b104895@gnf.org>
	<1122925500.3857.40.camel@localhost.localdomain>
	<2aae0a4129cb2c7407df5834b94f41aa@gnf.org>
	<2bf4b9070ab5bb61b34e15d3ae611044@duke.edu>
Message-ID: <08043cb9048bd811f56f265e20fed521@gnf.org>


On Aug 1, 2005, at 7:31 PM, Jason Stajich wrote:

> I'm getting all tests passing for me on OSX and a few different linux 
> machines with different complements of aux modules installed.  I fixed 
> some minor things that were breaking.
>

I can actually confirm this. All tests passed last night. Cool, 
finally, thanks Jason for taking the time, this is helpful.
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Sat Aug  6 01:02:30 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Sat Aug  6 00:51:01 2005
Subject: [Bioperl-l] search2gff.PLS
Message-ID: <4e456734cc6e4972ea642f815f641a2d@gnf.org>

I was looking for an existing tool to convert a SearchIO report 
(multi-report BLASTN in my case) to GFF3 and found 
scripts/utilities/search2gff.PLS.

Is this the tool I'm looking for or did miss a better suited one 
elsewhere in either bioperl or gbrowse/gmod?

If this is the tool, why is it suitable only for protein matches? (The 
doc says so. I used it for BLASTN report and didn't find any 
nucleotide-related flaws.) Or is this old documentation that should be 
fixed?

Also, the container match feature comes out with start and end being 
equal. I'm not sure whether I was doing something wrong, but the 
following fixes this.

146,148c149,151
<           $max{$type} = $proxyfor->start unless defined $max{$type} 
&& $max{$type} > $proxyfor->end;
<           $min{$other} = $otherf->start unless defined $min{$type} && 
$min{$type} < $otherf->start;
<           $max{$other} = $otherf->start unless defined $max{$type} && 
$max{$type} > $otherf->end;
---
 >           $max{$type} = $proxyfor->end unless defined $max{$type} && 
$max{$type} > $proxyfor->end;
 >           $min{$other} = $otherf->start unless defined $min{$other} 
&& $min{$other} < $otherf->start;
 >           $max{$other} = $otherf->end unless defined $max{$other} && 
$max{$other} > $otherf->end;

Since I don't understand 100% what should happen for a correct GFF3 
match container, I just wanted to make sure that this is indeed a fix 
and not introducing a bug before I commit it. Was anybody using this 
tool with the -m option?

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From s0460205 at sms.ed.ac.uk  Mon Aug  8 04:17:51 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Mon Aug  8 04:10:01 2005
Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records
Message-ID: <1123489071.42f7152f3690e@sms.ed.ac.uk>

Hi folks,

I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db 1.2) set up
containing protein and gene data. However, when I load gene sequence records
(EMBL or Genbank) using:

perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk -dbuser s0460205
-dbpass password -format embl /home/s0460205/file_name.txt

from bioperl-db it does not pick up any dbxrefs i.e. there is no dbxref_id for
MEDLINE etc.

Has anyone else come across this rpoblem and is ther a fix?

Cheers,

Stephen

From hotafin at gmail.com  Mon Aug  8 09:28:16 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Mon Aug  8 09:19:30 2005
Subject: [Bioperl-l] local electric charge/hydrophobicity/flexibility of
	proteins
Message-ID: <c343d708050808062820d50742@mail.gmail.com>

Sorry for the OT, but does anyone know a program (command line), that
can calculate local electric charges (electric charge
distribution/density), hydrophobicity, protein felxibility, based on
pdb structures?
Or does anyone know where may I find corresponding algorithms?

From lstein at cshl.edu  Mon Aug  8 15:02:09 2005
From: lstein at cshl.edu (lstein@cshl.edu)
Date: Mon Aug  8 14:53:07 2005
Subject: [Bioperl-l] Bioperl version string not picked up by MakeMaker
Message-ID: <200508081902.j78J29dj005819@presto.lsjs.org>

Hi,

Sadly, the Bio::Root::Version system does not play nicely with
MakeMaker. I have a WriteMakefile() routine in the gbrowse Makefile.PL
which looks like this:

WriteMakefile(
	      'NAME'	     => 'Generic-Genome-Browser',
	      'VERSION'      => $VERSION,
	      'PREREQ_PM'    => {
				 Bio::Perl         => 1.5,
				 GD                => 2.07,
				 IO::String        => 0,
				 Text::Shellwords  => 1.0,
				}, # e.g., Module::Name => 1.1
		...);


But when I run perl Makefile.PL I get:

   Warning: prerequisite Bio::Perl 1.5 not found. We have unknown
            version.

I have added a "use Bio::Root::Version '$VERSION'" to Bio/Perl.pm and
this seems to fix the problem, but someone who understands MakeMaker
better had better confirm that this is the right solution.

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From amackey at pcbi.upenn.edu  Mon Aug  8 15:12:41 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Mon Aug  8 15:04:00 2005
Subject: [Bioperl-l] Bioperl version string not picked up by MakeMaker
In-Reply-To: <200508081902.j78J29dj005819@presto.lsjs.org>
References: <200508081902.j78J29dj005819@presto.lsjs.org>
Message-ID: <E3C2851B-BCF9-419E-B74F-EBD63B57221B@pcbi.upenn.edu>

One backward-compatible alternative is to direct your PREREQ_PM to  
Bio::Root::Version instead of Bio::Perl, but I like your forward- 
compatible solution as well.

-Aaron

On Aug 8, 2005, at 3:02 PM, <lstein@cshl.edu> <lstein@cshl.edu> wrote:

> Hi,
>
> Sadly, the Bio::Root::Version system does not play nicely with
> MakeMaker. I have a WriteMakefile() routine in the gbrowse Makefile.PL
> which looks like this:
>
> WriteMakefile(
>           'NAME'         => 'Generic-Genome-Browser',
>           'VERSION'      => $VERSION,
>           'PREREQ_PM'    => {
>                  Bio::Perl         => 1.5,
>                  GD                => 2.07,
>                  IO::String        => 0,
>                  Text::Shellwords  => 1.0,
>                 }, # e.g., Module::Name => 1.1
>         ...);
>
>
> But when I run perl Makefile.PL I get:
>
>    Warning: prerequisite Bio::Perl 1.5 not found. We have unknown
>             version.
>
> I have added a "use Bio::Root::Version '$VERSION'" to Bio/Perl.pm and
> this seems to fix the problem, but someone who understands MakeMaker
> better had better confirm that this is the right solution.
>
> Lincoln
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse@cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Aaron J. Mackey, Ph.D.
Project Manager, ApiDB Bioinformatics Resource Center
Penn Genomics Institute, University of Pennsylvania
email:  amackey@pcbi.upenn.edu
office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
fax:    215-746-6697
postal: Penn Genomics Institute
         Goddard Labs 212
         415 S. University Avenue
         Philadelphia, PA  19104-6017


From jason.stajich at duke.edu  Mon Aug  8 15:24:54 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug  8 15:17:57 2005
Subject: [Bioperl-l] Bioperl version string not picked up by MakeMaker
In-Reply-To: <200508081902.j78J29dj005819@presto.lsjs.org>
References: <200508081902.j78J29dj005819@presto.lsjs.org>
Message-ID: <3CA4E65C-60B5-4342-BE56-E5C445F727A6@duke.edu>

VERSION_FROM tries to parse the file for the version information, I  
am pretty sure the Bio::Root::Version stuff is for run-time  
initialization of $VERSION variable for each package.  I am guessing  
that is what PREREQ_PM is also doing to determine a version for the  
dependancy.

If you make it depend on Bio::Root::Version and it will be able to  
parse it but I assume Bio::Perl is a clearer dependancy.  Otherwise  
what you've done for Bio::Perl makes and including the $VERSION  
variable makes the most sense.

-jason
On Aug 8, 2005, at 3:02 PM, <lstein@cshl.edu> <lstein@cshl.edu> wrote:

> Hi,
>
> Sadly, the Bio::Root::Version system does not play nicely with
> MakeMaker. I have a WriteMakefile() routine in the gbrowse Makefile.PL
> which looks like this:
>
> WriteMakefile(
>           'NAME'         => 'Generic-Genome-Browser',
>           'VERSION'      => $VERSION,
>           'PREREQ_PM'    => {
>                  Bio::Perl         => 1.5,
>                  GD                => 2.07,
>                  IO::String        => 0,
>                  Text::Shellwords  => 1.0,
>                 }, # e.g., Module::Name => 1.1
>         ...);
>
>
> But when I run perl Makefile.PL I get:
>
>    Warning: prerequisite Bio::Perl 1.5 not found. We have unknown
>             version.
>
> I have added a "use Bio::Root::Version '$VERSION'" to Bio/Perl.pm and
> this seems to fix the problem, but someone who understands MakeMaker
> better had better confirm that this is the right solution.
>
> Lincoln
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse@cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gnf.org  Mon Aug  8 16:04:08 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Aug  8 15:52:41 2005
Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records
In-Reply-To: <1123489071.42f7152f3690e@sms.ed.ac.uk>
References: <1123489071.42f7152f3690e@sms.ed.ac.uk>
Message-ID: <23040211b0fc735fcf7c97fc97770473@gnf.org>

Are you referring to references and their PMID? These you would find in 
the Reference table, which has a foreign key to dbxref, which would 
only store the PUBMED or MEDLINE ID (not both at this time). Can you 
given an example accession that's giving you grief?

	-hilmar

On Aug 8, 2005, at 1:17 AM, SG Edwards wrote:

> Hi folks,
>
> I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db 
> 1.2) set up
> containing protein and gene data. However, when I load gene sequence 
> records
> (EMBL or Genbank) using:
>
> perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk -dbuser 
> s0460205
> -dbpass password -format embl /home/s0460205/file_name.txt
>
> from bioperl-db it does not pick up any dbxrefs i.e. there is no 
> dbxref_id for
> MEDLINE etc.
>
> Has anyone else come across this rpoblem and is ther a fix?
>
> Cheers,
>
> Stephen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From markus.riester at student.uni-tuebingen.de  Mon Aug  8 13:12:36 2005
From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de)
Date: Mon Aug  8 17:21:27 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files
Message-ID: <twig.1123499556.93098@mail.uni-tuebingen.de>


Hi,

I've made some modules for searching for patterns in fasta files with
different (really fast) backends like agrep and vmatch.  I don't think you
want to include this in standard bioperl. But we think it is useful code and
we'd like to share it on cpan. The main reason for this email is a discussion
about the right namespace for this module. What do you think? 

Markus

(hope the attachment reaches the mailinglist, if not, please send me a mail if
you are interested in this code)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/x-gzip
Size: 26854 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050808/69bfd899/attachment-0001.bin
From iluminati at earthlink.net  Mon Aug  8 22:01:27 2005
From: iluminati at earthlink.net (iluminati@earthlink.net)
Date: Mon Aug  8 21:51:55 2005
Subject: [Bioperl-l] Question about handling ontology files
Message-ID: <42F80E77.2000205@earthlink.net>

Here's my situation.  I have a bunch of ontologies downloaded from a 
batch run on SOURCE.  What I want to be able to do is parse these files 
so I can count the different numbers of instances of terms within all 3 
sets of ontological descriptions (biological process, cellular component 
and molecular function).  Is there something in Perl or another program 
that could help me out with this situation?  Any information that you 
have would be useful to me.  Thanks

Todd Graham

From sdavis2 at mail.nih.gov  Tue Aug  9 07:05:33 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Aug  9 06:56:02 2005
Subject: [Bioperl-l] Question about handling ontology files
In-Reply-To: <42F80E77.2000205@earthlink.net>
Message-ID: <BF1E063D.B506%sdavis2@mail.nih.gov>

On 8/8/05 10:01 PM, "iluminati@earthlink.net" <iluminati@earthlink.net>
wrote:

> Here's my situation.  I have a bunch of ontologies downloaded from a
> batch run on SOURCE.  What I want to be able to do is parse these files
> so I can count the different numbers of instances of terms within all 3
> sets of ontological descriptions (biological process, cellular component
> and molecular function).  Is there something in Perl or another program
> that could help me out with this situation?  Any information that you
> have would be useful to me.  Thanks

I think you will have to parse the SOURCE files yourself.  After that is
done, there are several options including go-perl (from
http://www.geneontology.org), Bioperl (Bio::OntologyIO and relatives), and
GO-TermFinder (on CPAN).  I'm not sure which is going to be the best option
for you.  

If you are comfortable with RDBMs, you could download the tables from the GO
mysql database and do lookups yourself using DBI.

Hope this helps,
Sean

From sdavis2 at mail.nih.gov  Tue Aug  9 07:38:53 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Aug  9 07:30:26 2005
Subject: [Bioperl-l] Question about handling ontology files
In-Reply-To: <BF1E063D.B506%sdavis2@mail.nih.gov>
Message-ID: <BF1E0E0D.B7CB%sdavis2@mail.nih.gov>

On 8/9/05 7:05 AM, "Davis, Sean (NIH/NHGRI)" <sdavis2@mail.nih.gov> wrote:

> On 8/8/05 10:01 PM, "iluminati@earthlink.net" <iluminati@earthlink.net>
> wrote:
> 
>> Here's my situation.  I have a bunch of ontologies downloaded from a
>> batch run on SOURCE.  What I want to be able to do is parse these files
>> so I can count the different numbers of instances of terms within all 3
>> sets of ontological descriptions (biological process, cellular component
>> and molecular function).  Is there something in Perl or another program
>> that could help me out with this situation?  Any information that you
>> have would be useful to me.  Thanks
> 
> I think you will have to parse the SOURCE files yourself.  After that is
> done, there are several options including go-perl (from
> http://www.geneontology.org), Bioperl (Bio::OntologyIO and relatives), and
> GO-TermFinder (on CPAN).  I'm not sure which is going to be the best option
> for you.  
> 
> If you are comfortable with RDBMs, you could download the tables from the GO
> mysql database and do lookups yourself using DBI.

I forgot to mention the EASIEST way to do this.  If you have SOURCE output,
you can include locuslink IDs that you can use to put into various online or
standalone ontology analysis packages.  See this link for a recent review:

http://bioinformatics.oxfordjournals.org/cgi/reprint/bti565v1


From amackey at pcbi.upenn.edu  Tue Aug  9 09:13:14 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Tue Aug  9 09:03:38 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files
In-Reply-To: <twig.1123499556.93098@mail.uni-tuebingen.de>
References: <twig.1123499556.93098@mail.uni-tuebingen.de>
Message-ID: <4C10E798-EC55-4242-B573-7352B1F4FB55@pcbi.upenn.edu>

Out of curiosity, are your patterns allowed to cross newlines  
embedded in the FASTA file?  This is the typical problem with using  
grep/agrep directly with sequence files ...

-Aaron

On Aug 8, 2005, at 1:12 PM, <markus.riester@student.uni-tuebingen.de>  
<markus.riester@student.uni-tuebingen.de> wrote:

>
> Hi,
>
> I've made some modules for searching for patterns in fasta files with
> different (really fast) backends like agrep and vmatch.  I don't  
> think you
> want to include this in standard bioperl. But we think it is useful  
> code and
> we'd like to share it on cpan. The main reason for this email is a  
> discussion
> about the right namespace for this module. What do you think?
>
> Markus
>
> (hope the attachment reaches the mailinglist, if not, please send  
> me a mail if
> you are interested in this code)
>
>
> <Weigel-Search-0.03.tar.gz>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

--
Aaron J. Mackey, Ph.D.
Project Manager, ApiDB Bioinformatics Resource Center
Penn Genomics Institute, University of Pennsylvania
email:  amackey@pcbi.upenn.edu
office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
fax:    215-746-6697
postal: Penn Genomics Institute
         Goddard Labs 212
         415 S. University Avenue
         Philadelphia, PA  19104-6017


From markus.riester at student.uni-tuebingen.de  Tue Aug  9 15:32:09 2005
From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de)
Date: Tue Aug  9 09:25:48 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files
In-Reply-To: <4C10E798-EC55-4242-B573-7352B1F4FB55@pcbi.upenn.edu>
References: <twig.1123499556.93098@mail.uni-tuebingen.de>,
	<twig.1123499556.93098@mail.uni-tuebingen.de>
Message-ID: <twig.1123594329.71507@mail.uni-tuebingen.de>

with a cheap trick, yes, split the fasta files in two files. ids in one file,
sequences -one per line- in the second. 

this should be ok for cdna/protein fastafiles (but I am currently writing
tests-maybe some serious problems with the chars per line limitations show
up-but I did look good in some first tests.)

we don't use agrep anymore, because vmatch is really, really good. only with
many mismatches and short query sequences, agrep seems to be a bit faster. 
 
markus

"Aaron J. Mackey" <amackey@pcbi.upenn.edu> schrieb:

> Out of curiosity, are your patterns allowed to cross newlines  
> embedded in the FASTA file?  This is the typical problem with using  
> grep/agrep directly with sequence files ...
> 
> -Aaron
> 
> On Aug 8, 2005, at 1:12 PM, <markus.riester@student.uni-tuebingen.de>  
> <markus.riester@student.uni-tuebingen.de> wrote:
> 
> >
> > Hi,
> >
> > I've made some modules for searching for patterns in fasta files with
> > different (really fast) backends like agrep and vmatch.  I don't  
> > think you
> > want to include this in standard bioperl. But we think it is useful  
> > code and
> > we'd like to share it on cpan. The main reason for this email is a  
> > discussion
> > about the right namespace for this module. What do you think?
> >
> > Markus
> >
> > (hope the attachment reaches the mailinglist, if not, please send  
> > me a mail if
> > you are interested in this code)
> >
> >
> > <Weigel-Search-0.03.tar.gz>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Aaron J. Mackey, Ph.D.
> Project Manager, ApiDB Bioinformatics Resource Center
> Penn Genomics Institute, University of Pennsylvania
> email:  amackey@pcbi.upenn.edu
> office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
> fax:    215-746-6697
> postal: Penn Genomics Institute
>          Goddard Labs 212
>          415 S. University Avenue
>          Philadelphia, PA  19104-6017
> 
> 


-- 


From reche at research.dfci.harvard.edu  Tue Aug  9 11:48:47 2005
From: reche at research.dfci.harvard.edu (Pedro Antonio Reche)
Date: Tue Aug  9 11:38:52 2005
Subject: [Bioperl-l] embl sequences 2 fasta
In-Reply-To: <20050630215553.GB13422@bioinfo.ucr.edu>
References: <434AF352F9D03C4C896782B8CC78BC7687F264@VADER.oriongenomics.com>
	<20050630215553.GB13422@bioinfo.ucr.edu>
Message-ID: <b9ddd7b481c799f67fe1544723603110@research.dfci.harvard.edu>

Hi
I am interesting in finding all sequences from embl matching a given 
feature  in sub cellular location and then create a single file for 
each of them in fasta format. Any help will be appreciated.
Regards,

pedro

On Jun 30, 2005, at 5:55 PM, Josh Lauricha wrote:

> On Thu 06/30/05 16:48, Joseph Bedell wrote:
>> You can calculate the score given the bit score (from the tabular
>> output) and Lambda (calculated from the matrix). The equation is 
>> Score =
>> (Bits)/(Lambda in bits).
>>
>> Lambda is only dependent upon the matrix. Did you use NCBI-blast or
>> WU-BLAST? Which flavor of blast (blastn, blastp, etc)? In any case, 
>> you
>> can just run a single blast and look at the stats at the bottom of the
>> report to get the value of lambda. For example, a default NCBI-blastn
>> (+1/-3) search has a lambda of 1.37
>>
>> ============================
>> Lambda     K      H
>>     1.37    0.711     1.31
>>
>> Gapped
>> Lambda     K      H
>>     1.37    0.711     1.31
>> ===============================
>>
>> But, what is difficult to discover is this lambda is in NATS. To 
>> convert
>> it to bits, divide it by the natural log of 2, or in perl:
>>
>> perl -e 'print 1.37/log(2),"\n"'
>> 1.97649220601788
>>
>> So, now you can take all of your bit scores divided by 
>> 1.97649220601788
>> to get the Score.
>>
>> HTH,
>> Joey
>
> Cool, thanks. That'll save me a bunch of time ;) This was NCBI blastp,
> so I've already got it calculated ;)
>
> Thanks.
>
> -- 
>
> ------------------------------------------------------
> | Josh Lauricha            | Ford, you're turning    |
> | laurichj@bioinfo.ucr.edu | into a penguin. Stop    |
> | Bioinformatics, UCR      | it                      |
> |----------------------------------------------------|
> | OpenPG:                                            |
> |  4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 |
> |----------------------------------------------------|
> | Geek Code: Version 3.12                            |
> | GAT/CS$/IT$ d+ s-: a-->--- C++++$ UL++++$ P++ L++++|
> | $E--- W+ N o? K? w--(---) O? M+(++) V? PS++ PE-(--)|
> | Y+ PGP+++ t--- 5+++ X+ R tv DI++ D--- G++          |
> | e++ h- r++ z?                                      |
> |----------------------------------------------------|
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From iain.m.wallace at gmail.com  Tue Aug  9 07:15:46 2005
From: iain.m.wallace at gmail.com (Iain Wallace)
Date: Tue Aug  9 11:42:08 2005
Subject: [Bioperl-l] [Bioperl -l] Problem reading EMBL format file
Message-ID: <8cff3eb805080904155c0682b9@mail.gmail.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: COAT_SBMV.M23021.embl
Type: application/octet-stream
Size: 10333 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/4b578b28/COAT_SBMV.M23021.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: COAT_SBMV.AAA46567.cds
Type: application/octet-stream
Size: 1901 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/4b578b28/COAT_SBMV.AAA46567.obj
From s0460205 at sms.ed.ac.uk  Tue Aug  9 12:21:07 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Tue Aug  9 12:51:10 2005
Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records
In-Reply-To: <23040211b0fc735fcf7c97fc97770473@gnf.org>
References: <1123489071.42f7152f3690e@sms.ed.ac.uk>
	<23040211b0fc735fcf7c97fc97770473@gnf.org>
Message-ID: <1123604467.42f8d7f396648@sms.ed.ac.uk>

Hi,

My installation does not pick up ANY dbxrefs for gene records e.g. Pubmed,
MEDLINE(either EMBL or Genbank formats). When I load them into the database
they go in fine but no dbxref_ids are mapped to the bioentry_id in the
bioentry_dbxref table. Therefore, nothing appears in the dbxref table either!

The system works fine for UniProt protein entries into the database. I am
currently installing BioPerl v 1.5 to see if this resolves the problem.

An example: NM_214434 from Genbank which has the dbxrefs:

Pubmed 1503277
Taxon  9823
GeneID 404088

Quoting Hilmar Lapp <hlapp@gnf.org>:

> Are you referring to references and their PMID? These you would find in
> the Reference table, which has a foreign key to dbxref, which would
> only store the PUBMED or MEDLINE ID (not both at this time). Can you
> given an example accession that's giving you grief?
>
> 	-hilmar
>
> On Aug 8, 2005, at 1:17 AM, SG Edwards wrote:
>
> > Hi folks,
> >
> > I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db
> > 1.2) set up
> > containing protein and gene data. However, when I load gene sequence
> > records
> > (EMBL or Genbank) using:
> >
> > perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk -dbuser
> > s0460205
> > -dbpass password -format embl /home/s0460205/file_name.txt
> >
> > from bioperl-db it does not pick up any dbxrefs i.e. there is no
> > dbxref_id for
> > MEDLINE etc.
> >
> > Has anyone else come across this rpoblem and is ther a fix?
> >
> > Cheers,
> >
> > Stephen
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>


From jason.stajich at duke.edu  Tue Aug  9 13:09:31 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Aug  9 12:59:20 2005
Subject: [Bioperl-l] Bio::DB::Taxonomy::entrez updated
Message-ID: <01B3F788-B350-4087-9B71-3E62BE16911F@duke.edu>

I've updated Bio::DB::Taxonomy::entrez to now fully parse out the XML  
from the Efetch Eutils CGI script.  Can now return a fully populated  
Bio::Taxonomy::Node object, most importantly with a parent_id field  
filled in.  This allows the web-only implementation to work just as  
the flatfile implementation does and you can walk up the taxonomy  
hierarchy.  There is currently no way to walk down the hierarchy  
unless one can construct an Entrez query to get all the nodes which  
have a particular parent.  If someone knows how to do this, please  
let me know.

I added a few fields to Bio::Taxonomy::Node to capture genetic_code,  
pub_date, update_date, create_date, mitochondrial_genetic_code from  
the database entry.

At this point I think we can think about retiring Bio::Species and  
replace it with Bio::Taxonomy::Node.  I would probably just make  
Bio::Species delegate Bio::Taxonomy::Node or maybe someone can think  
of something more clever.  There will be a bit of fiddling under the  
hood to make this really work, but I think it can be done for the 1.6  
release and still be transparent to the user (i.e. API is completely  
retained for Bio::Seq->species, Bio::Species, etc however new  
functionality is now also available).

Here is how you can use the DB interface:

   use Bio::DB::Taxonomy;

   my $db = new Bio::DB::Taxonomy(-source => 'entrez');

   my $taxonid = $db->get_taxonid('Homo sapiens');
   my $node   = $db->get_Taxonomy_Node(-taxonid => $taxonid);
   print $node->binomial, "\n";

I added a script in scripts/taxa/query_entrez_taxa.PLS which  
demonstrates how to use it as well.

Where I find this modules useful is parsing a Search Result report  
and classifying hits by taxonomy.  Given a gi numbers in the search  
result (BLAST, FASTA, SSEARCH hits), getting the taxaid for the GI is  
just one step away now.
I added a capability to the API in Bio::DB::Taxonomy::entrez for  
retrieving taxonomy info based on a GI number.  You can pass in the - 
gi => $ginumber option to the get_Taxonomy_Node.

Demonstration of use here:

   my $gi = 71836523;
   my $node = $db->get_Taxonomy_Node(-gi => $gi, -db => 'protein');
   print $node->binomial, "\n";
   my ($species,$genus,$family) =  $node->classification;
   print "family is $family\n";

   # Can also go up 4 levels
   my $p = $node;
   for ( 1..4 ) {
     $p = $db->get_Taxonomy_Node(-taxonid => $p->parent_id);
   }
   print $p->rank, " ", ($p->classification)[0], "\n";

   # could then classify a set of BLAST hits based on their GI numbers
   # into taxonomic categories.


I have tried to put these examples in the SYNOPSIS, t/Taxonomy.t and  
the script in scripts/taxa/query_entrez_taxa.PLS.  If there are  
mistakes or typos, or something is unclear, please let us know and it  
can updated.    I hope a section describing how to use these in  
SearchIO context (parsing reports) can be added when I have time.

Best,
-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gnf.org  Tue Aug  9 12:40:12 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Aug  9 13:27:10 2005
Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records
In-Reply-To: <1123604467.42f8d7f396648@sms.ed.ac.uk>
References: <1123489071.42f7152f3690e@sms.ed.ac.uk>
	<23040211b0fc735fcf7c97fc97770473@gnf.org>
	<1123604467.42f8d7f396648@sms.ed.ac.uk>
Message-ID: <4cde9b9219492587cdb09999fb7980cc@gnf.org>

This is a RefSeq accession. In GenBank format the db_xrefs you see are 
notes for features in the feature table, not top-level db_xrefs (i.e., 
for the entry itself), although semantically of course that's what they 
are. Bioperl (i.e., the Bioperl SeqIO parser for genbank format) 
doesn't interpret that however, and leaves them where they are, namely 
as annotation for the features. The single exception to that is that 
the parser actually does look for the taxon ID in the feature table and 
sets the $seq->species->ncbi_taxon_id property accordingly.

GenBank format doesn't have top-level db_xrefs at all. You will need 
EMBL format for that. As I said before, the PUBMED line is not a 
db_xref for the entry either but the db_xref for the reference entry, 
so you will need to retrieve the references 
($seq->annotation->get_Annotations('reference')) and use its 
$ref->pubmed or $ref->medline properties.

BTW this will still hold true if you first load the sequences into 
bioperl-db and then retrieve them; there isn't really any magic being 
applied that would transform db_xrefs into a common unified picture.

I use a SequenceProcessor (see Bio::Seq::BaseSeqProcessor and the 
--pipeline option to load_seqdatabase.pl) to promote db_xref tags found 
in the feature table of genbank records to Bio::Annotation::DBLink 
annotation on the sequence object. Very easy to implement and you are 
in total control of the annotation structure.

	-hilmar

On Aug 9, 2005, at 9:21 AM, SG Edwards wrote:

> Hi,
>
> My installation does not pick up ANY dbxrefs for gene records e.g. 
> Pubmed,
> MEDLINE(either EMBL or Genbank formats). When I load them into the 
> database
> they go in fine but no dbxref_ids are mapped to the bioentry_id in the
> bioentry_dbxref table. Therefore, nothing appears in the dbxref table 
> either!
>
> The system works fine for UniProt protein entries into the database. I 
> am
> currently installing BioPerl v 1.5 to see if this resolves the problem.
>
> An example: NM_214434 from Genbank which has the dbxrefs:
>
> Pubmed 1503277
> Taxon  9823
> GeneID 404088
>
> Quoting Hilmar Lapp <hlapp@gnf.org>:
>
>> Are you referring to references and their PMID? These you would find 
>> in
>> the Reference table, which has a foreign key to dbxref, which would
>> only store the PUBMED or MEDLINE ID (not both at this time). Can you
>> given an example accession that's giving you grief?
>>
>> 	-hilmar
>>
>> On Aug 8, 2005, at 1:17 AM, SG Edwards wrote:
>>
>>> Hi folks,
>>>
>>> I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db
>>> 1.2) set up
>>> containing protein and gene data. However, when I load gene sequence
>>> records
>>> (EMBL or Genbank) using:
>>>
>>> perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk 
>>> -dbuser
>>> s0460205
>>> -dbpass password -format embl /home/s0460205/file_name.txt
>>>
>>> from bioperl-db it does not pick up any dbxrefs i.e. there is no
>>> dbxref_id for
>>> MEDLINE etc.
>>>
>>> Has anyone else come across this rpoblem and is ther a fix?
>>>
>>> Cheers,
>>>
>>> Stephen
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From markus.riester at student.uni-tuebingen.de  Tue Aug  9 18:40:11 2005
From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de)
Date: Tue Aug  9 14:26:55 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files
In-Reply-To: <twig.1123594329.71507@mail.uni-tuebingen.de>
References: <twig.1123499556.93098@mail.uni-tuebingen.de>,
	<twig.1123499556.93098@mail.uni-tuebingen.de>,
	<4C10E798-EC55-4242-B573-7352B1F4FB55@pcbi.upenn.edu>
Message-ID: <twig.1123605611.94453@mail.uni-tuebingen.de>

update:

the tests are written. looks good. agrep finds matches at the end of the
longest arabidopsis cdna sequence (16kb). 

(but the tests showed some serious bugs in version 0.03, the one in the first
attachment. they are all fixed in this attachment)

markus


markus.riester@student.uni-tuebingen.de schrieb:

> with a cheap trick, yes, split the fasta files in two files. ids in one file,
> sequences -one per line- in the second. 
> 
> this should be ok for cdna/protein fastafiles (but I am currently writing
> tests-maybe some serious problems with the chars per line limitations show
> up-but I did look good in some first tests.)
> 
> we don't use agrep anymore, because vmatch is really, really good. only with
> many mismatches and short query sequences, agrep seems to be a bit faster. 
>  
> markus
> 
> "Aaron J. Mackey" <amackey@pcbi.upenn.edu> schrieb:
> 
> > Out of curiosity, are your patterns allowed to cross newlines  
> > embedded in the FASTA file?  This is the typical problem with using  
> > grep/agrep directly with sequence files ...
> > 
> > -Aaron
> > 
> > On Aug 8, 2005, at 1:12 PM, <markus.riester@student.uni-tuebingen.de>  
> > <markus.riester@student.uni-tuebingen.de> wrote:
> > 
> > >
> > > Hi,
> > >
> > > I've made some modules for searching for patterns in fasta files with
> > > different (really fast) backends like agrep and vmatch.  I don't  
> > > think you
> > > want to include this in standard bioperl. But we think it is useful  
> > > code and
> > > we'd like to share it on cpan. The main reason for this email is a  
> > > discussion
> > > about the right namespace for this module. What do you think?
> > >
> > > Markus
> > >
> > > (hope the attachment reaches the mailinglist, if not, please send  
> > > me a mail if
> > > you are interested in this code)
> > >
> > >
> > > <Weigel-Search-0.03.tar.gz>
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > --
> > Aaron J. Mackey, Ph.D.
> > Project Manager, ApiDB Bioinformatics Resource Center
> > Penn Genomics Institute, University of Pennsylvania
> > email:  amackey@pcbi.upenn.edu
> > office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
> > fax:    215-746-6697
> > postal: Penn Genomics Institute
> >          Goddard Labs 212
> >          415 S. University Avenue
> >          Philadelphia, PA  19104-6017
> > 
> > 
> 
> 
> 
> -- 
> 
> 
> 


-- 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/x-gzip
Size: 36342 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/1a48a1f2/attachment-0001.bin
From akarger at CGR.Harvard.edu  Tue Aug  9 15:20:34 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue Aug  9 15:08:09 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-fi les
Message-ID: <339D68B133EAD311971E009027DC47970321B47E@montecarlo.cgr.harvard.edu>

> From: markus.riester@student.uni-tuebingen.de 
> "Aaron J. Mackey" <amackey@pcbi.upenn.edu> schrieb:
> 
> > Out of curiosity, are your patterns allowed to cross newlines  
> > embedded in the FASTA file?  This is the typical problem 
> > with using  
> > grep/agrep directly with sequence files ...> 
>
> with a cheap trick, yes, split the fasta files in two files. 
> ids in one file,
> sequences -one per line- in the second. 


I wrote a simple one-liner to convert fasta to three, tab-separated columns:
ID (without '>') desc, and concatenated sequence. That way you don't have to
worry about keeping the two files tied together, but agrep should still find
things only in the concatenated sequence. (Unless somebody mean put a
sequence into the description column.) As an added bonus, it means you can
throw a FASTA into Excel for sorting, filtering, etc. Or merge with a gene
list pretty easily. 
It's at
http://cgr.harvard.edu/cbg/scriptome/Tools/Change.html#new__change_a_fasta_f
ile_into_tabular_format__change_fasta_to_tab_
along with the tab-to-FASTA converter, along with a couple sentences
describing potential gotchas (e.g., any tabs in the desc get lost)

> 
> this should be ok for cdna/protein fastafiles (but I am 
> currently writing
> tests-maybe some serious problems with the chars per line 
> limitations show
> up-but I did look good in some first tests.)

Can you tell me what you mean by this?

-Amir Karger
From ro_phls2 at dh.gov.hk  Tue Aug  9 20:34:24 2005
From: ro_phls2 at dh.gov.hk (Andrew Leung)
Date: Tue Aug  9 20:26:11 2005
Subject: [Bioperl-l] Extract Mutation Automatically
Message-ID: <20050810003350.DPM10378.pimx07@Leungkcro>

Hi all,
Is there any module available that can allow me to extract mutation(s)
automatically? The idea is that if I submit two sequences for alignment, the
script can automatically list out all the differences between the two
sequences. I wish to know the difference at two levels, i.e. the nucleotide
and amino acid level. Any ideas?
Andrew

From jason.stajich at duke.edu  Tue Aug  9 22:35:59 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Aug  9 22:26:00 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <20050810003350.DPM10378.pimx07@Leungkcro>
References: <20050810003350.DPM10378.pimx07@Leungkcro>
Message-ID: <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu>

I guess it comes down to what you want to do with the mutations once  
you've found them.

The seq_inds method in Bio::Search::HSP::HSPI  which is something you  
can call on hsp objects you've gotten out of pairwise alignment  
searches. seq_inds will give you the location of the identical,  
conserved, mismatched columns from a pairwise alignment.  I would  
suggest using FASTA or SSEARCH and

If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'

Here is how I would get the pairwise SW alignment and get the  
mutations out.

If you wanted a global alignment you can use the EMBOSS tool 'needle'  
and generate an MSF alignment which can be parsed with Bio::AlignIO.

some simple code to print out the bases which have mismatches
use Bio::SearchIO;
use strict;
my $fh;
#open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
#my $parser = Bio::SearchIO->new(-format => 'fasta',
#                -fh     => $fh);
my $parser = Bio::SearchIO->new(-format => 'blast',
                                                                - 
fh        => $fh);

if( my $result = $parser->next_result ) { # single result so use if  
instead of while
     if( my $hit = $result->next_hit ) {    # ditto, want single  
result...
     if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would  
need to consider more if using BLAST

         my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
         # if this is protein and you want to treat the conservative  
matches as mismatches
         # you'll need to run the same method but asking for  
'conserved' and then combing the two lists

         for my $base ( @qmismatches ) {
            print "base $base of the hit sequence is a mismatch \n",
        }
     }
     }
}


The Bio::PopGen::Utilities module can also take an alignment and  
extract the positions with variation for use in polymorphism analyses.

-jason

On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:

> Hi all,
> Is there any module available that can allow me to extract mutation(s)
> automatically? The idea is that if I submit two sequences for  
> alignment, the
> script can automatically list out all the differences between the two
> sequences. I wish to know the difference at two levels, i.e. the  
> nucleotide
> and amino acid level. Any ideas?
> Andrew
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From darren.obbard at ed.ac.uk  Wed Aug 10 03:54:51 2005
From: darren.obbard at ed.ac.uk (Darren Obbard)
Date: Wed Aug 10 03:45:08 2005
Subject: [Bioperl-l] calculating the Ka/Ks ratio
References: <20050810003350.DPM10378.pimx07@Leungkcro>
	<10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu>
Message-ID: <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard>

Hi all,

Is there a module that will take a pair of aligned (coding) sequences, and 
report the Ka/Ks ratio? (non-synonymous mutations per non-synonymous site / 
synonymous mutations per synonymous site).

I appreciate that PAML will give me an ML estimate of Ka/Ks, but I'm aiming 
to do a sliding-window analysis and don't wish to send each window to PAML 
individually, - I  wondered whether there may be a quicker alternative.

Thanks,

Darren
--
Darren Obbard
Institute of Evolutionary Biology
University of Edinburgh, UK
darren.obbard@ed.ac.uk

From hrh at sanger.ac.uk  Wed Aug 10 04:18:14 2005
From: hrh at sanger.ac.uk (Hans Rudolf Hotz)
Date: Wed Aug 10 04:13:45 2005
Subject: [Bioperl-l] [Bioperl -l] Problem reading EMBL format file
In-Reply-To: <8cff3eb805080904155c0682b9@mail.gmail.com>
References: <8cff3eb805080904155c0682b9@mail.gmail.com>
Message-ID: <Pine.OSF.4.58.0508100903390.1337825@cbi1b.internal.sanger.ac.uk>

Iain

This is one of the features of SRS. If you search EMBL with a ProteinID,
you don't search EMBL but you search EMBL_features. Hence, the output is
only one feature. And depending on your SRS installation this might look
more or less like an EMBL entry, but is not an EMBL entry.

In order to get an EMBL entry (with all the features, of course) you
can do:


getz "[EMBL-ProteinID:AAA46567] > embl" -e

or

getz "[EMBL-ProteinID:AAA46567] > parent" -e


Then you get the proper embl entry (M23021) which you can feed into SeqIO

Hope this helps,

Hans


On Tue, 9 Aug 2005, Iain Wallace wrote:

> Hi all,
>
> Hope fully somebody will be able to help me, I am having some difficulty
> reading a file that looks to me very much like EMBL format.
>
> I am trying to read some sequence files using SeqIO. Both files are obtained
> using the getz program with the following commands
> getz "[EMBL-ProteinID:AAA46567]" -e >COAT_SBMV.AAA46567.cds
> getz "[EMBL-Acc:M23021]" -e > COAT_SBMV.M23021.embl
>
> The embl file is read fine, and I am able to extract the features I want. I
> am having problems with the CDS file; it doesn't appear to be read properly.
> I guess the CDS file isn't a proper EMBL format. Does anyone know what
> format it is or how I could convert it to a proper EMBL format or
> alternatively how to make getz return the file in the proper format. The two
> files look very similar to me
>
> I tried the following little conversion program which worked fine on the
> EMBL file, but failed on the cds file with the error: No whitespace allowed
> in EMBL display id [unknown id]
>
> use Bio::SeqIO;
>
> $filename = $ARGV[0];
> $in = Bio::SeqIO->new(-file => $filename ,
> -format => 'EMBL');
> $out = Bio::SeqIO->new(-file => ">outputfilename" ,
> -format => 'EMBL');
>
> while ( my $seq = $in->next_seq() ) {
> $out->write_seq($seq);
> }
>
>
> Thanks for all your help
>
> Iain
>
From avilella at gmail.com  Wed Aug 10 04:26:53 2005
From: avilella at gmail.com (Albert Vilella)
Date: Wed Aug 10 04:17:45 2005
Subject: [Bioperl-l] calculating the Ka/Ks ratio
In-Reply-To: <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard>
References: <20050810003350.DPM10378.pimx07@Leungkcro>
	<10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu>
	<003501c59d80$ca0e80d0$9ebfd781@DarrenObbard>
Message-ID: <1123662413.8228.3.camel@localhost.localdomain>

El dc 10 de 08 del 2005 a les 08:54 +0100, en/na Darren Obbard va
escriure:
> Hi all,
> 
> Is there a module that will take a pair of aligned (coding) sequences, and 
> report the Ka/Ks ratio? (non-synonymous mutations per non-synonymous site / 
> synonymous mutations per synonymous site).
> 
> I appreciate that PAML will give me an ML estimate of Ka/Ks, but I'm aiming 
> to do a sliding-window analysis and don't wish to send each window to PAML 
> individually, - I  wondered whether there may be a quicker alternative.

There is a calc_KaKs_Pair method in Bio::Align::DNAStatistics
(Nei-Gojobori method)

>From the synopsis:

  my $in = new Bio::AlignIO(-format => 'fasta',
                            -file   => 't/data/nei_gojobori_test.aln');
  my $alnobj = $in->next_aln;
  my ($seq1id,$seq2id) = map { $_->display_id } $alnobj->each_seq;
  my $results = $stats->calc_KaKs_pair($alnobj, $seq1id, $seq2id);
  print "comparing ".$results->[0]{'Seq1'}." and
".$results->[0]{'Seq2'}."\n";
  for (sort keys %{$results->[0]} ){
      next if /Seq/;
      printf("%-9s %.4f \n",$_ , $results->[0]{$_});
  }

  my $results2 = $stats->calc_all_KaKs_pairs($alnobj);
  for my $an (@$results2){
      print "comparing ". $an->{'Seq1'}." and ". $an->{'Seq2'}. " \n";
      for (sort keys %$an ){
	  next if /Seq/;
	  printf("%-9s %.4f \n",$_ , $an->{$_});
      }
      print "\n\n";
  }

  my $result3 = $stats->calc_average_KaKs($alnobj, 1000);
  for (sort keys %$result3 ){
      next if /Seq/;
      printf("%-9s %.4f \n",$_ , $result3->{$_});
  }

Hope it helps,

    Albert.

-- 
Albert J. Vilella    avilella_at_ub_edu
--------------------------------------------
Departament de Genetica
Universitat de Barcelona
Diagonal 645 08028, Barcelona
Tel: +34 934035306 Fax: +34 934034420
--------------------------------------------
avilella_at_ebi_ac_uk
EMBL Outstation, European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambs. CB10 1SD, United Kingdom
--------------------------------------------------

From csaba.ortutay at uta.fi  Wed Aug 10 04:37:50 2005
From: csaba.ortutay at uta.fi (Csaba Ortutay)
Date: Wed Aug 10 04:36:33 2005
Subject: [Bioperl-l] calculating the Ka/Ks ratio
In-Reply-To: <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard>
References: <20050810003350.DPM10378.pimx07@Leungkcro>
	<10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu>
	<003501c59d80$ca0e80d0$9ebfd781@DarrenObbard>
Message-ID: <200508101137.50388.csaba.ortutay@uta.fi>


> Is there a module that will take a pair of aligned (coding) sequences, and
> report the Ka/Ks ratio? (non-synonymous mutations per non-synonymous site /
> synonymous mutations per synonymous site).

See the Bio::Align::DNAStatistics module.

That's working nicely.

 Csaba

-- 
Csaba Ortutay PhD
Institute of Medical Technology
University of Tampere
e-mail: csaba.ortutay@uta.fi
From birney at ebi.ac.uk  Wed Aug 10 05:00:29 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Wed Aug 10 04:59:53 2005
Subject: [Bioperl-l] Bio::SeqFeature::OntologyTypedI Proposal
Message-ID: <42F9C22D.4030603@ebi.ac.uk>


Hi guys...


In my spare time (read... train time) I'm back on a little
bit of bioperl. I hope to in the future set up an
Ensembl->Bioperl Bridge (ie, seeing Ensembl objects as
fully compliant bioperl objects) but before I did that
I wanted to do my bit for 1.6


So... following on from Chris' proposal of sorting
out SeqFeature typing, here is my proposal:


Bio::SeqFeature::OntologyTypedI - extends Bio::SeqFeatureI
and has a method $sf->ontology_term() which returns a
Bio::Ontology::TermI compliant object.

ie, the synopsis would look like:


=head1 NAME

Bio::SeqFeature::OntologyTypedI - a strongly typed SeqFeature

=head1 SYNOPSIS


    # get Sequence Features in some manner, eg
    # from a Sequence object

     foreach $sf ( $seq->get_SeqFeatures() ) {
         # all sequence features must have primary_tag() return a string
         $type_as_string = $sf->primary_tag();

         # ontologytyped seqfeatures have an ontology term
         if( $sf->isa("Bio::SeqFeature::OntologyTypedI") ) {
             $ot = $sf->ontology_term();
             print "Ontology identifier:",$ot->identifier()," name:",$ot->name()," Description:",$ot->description(),"\n";
         } else {
             print "Sequence Feature does not have an ontology type - tag is $type_as_string\n";
         }

     }


I would then implement this in

    Bio::SeqFeature::OntologyCompliant

which would inheriet its implementation from Bio::SeqFeature::Generic, but
chain primary_tag to

    $sf->ontology_term()->name();


Having done this I don't know how much "magic" I should put into
SeqIO to automatically promote things into Ontology compliant terms,
or perhaps we should have a converter - which one can register
with a SeqIO EMBL or GenBank stream being something like

     $new_sf = $converter->convert($old_sf,$seq);


This might conflict with an unflattener or something.


What do people think about this proposal? What else do I need
to do to tidy this up?


From amackey at pcbi.upenn.edu  Wed Aug 10 08:24:48 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Aug 10 08:16:31 2005
Subject: [Bioperl-l] Bio::SeqFeature::OntologyTypedI Proposal
In-Reply-To: <42F9C22D.4030603@ebi.ac.uk>
References: <42F9C22D.4030603@ebi.ac.uk>
Message-ID: <6D5DA1C7-11E2-4A0C-9EF6-8A4B6ED4D388@pcbi.upenn.edu>

Isn't this akin to Bio::Factory::SequenceProcessorI functionality?

otherwise it all sounds good to me.

-Aaron

On Aug 10, 2005, at 5:00 AM, Ewan Birney wrote:

> perhaps we should have a converter

--
Aaron J. Mackey, Ph.D.
Project Manager, ApiDB Bioinformatics Resource Center
Penn Genomics Institute, University of Pennsylvania
email:  amackey@pcbi.upenn.edu
office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
fax:    215-746-6697
postal: Penn Genomics Institute
         Goddard Labs 212
         415 S. University Avenue
         Philadelphia, PA  19104-6017


From cjm at fruitfly.org  Wed Aug 10 17:03:29 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed Aug 10 16:53:37 2005
Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal
In-Reply-To: <42F9C22D.4030603@ebi.ac.uk>
References: <42F9C22D.4030603@ebi.ac.uk>
Message-ID: <Pine.OSX.4.58.0508101353570.26537@skerryvore.dhcp.lbl.gov>


Sounds like the beginnings of a plan! Perhaps we can come up with a
shorter/catchier name but I'm not that bothered.

The plan below will naturally extend to tag_values as well, with
OntologyCompliant delegating the existing methods.

We should also figure out how this ties in with
Bio::SeqFeature::{Gene,Transcript,Exon} etc - if at all. In many ways,
they are different ways of achieving the same thing, namely stronger
typing of features. One scenario is that the class-types
piggyback off of the ontology-typed classes. The other is that they are
completely independent.

Regarding 'magic' in SeqIO - not sure this is required. You can already
plug in your own factories here, we just need to extend this with feature
factories. The default method will continue to produce relatively light
SF::Generics?

On Wed, 10 Aug 2005, Ewan Birney wrote:

>
> Hi guys...
>
>
> In my spare time (read... train time) I'm back on a little
> bit of bioperl. I hope to in the future set up an
> Ensembl->Bioperl Bridge (ie, seeing Ensembl objects as
> fully compliant bioperl objects) but before I did that
> I wanted to do my bit for 1.6
>
>
> So... following on from Chris' proposal of sorting
> out SeqFeature typing, here is my proposal:
>
>
> Bio::SeqFeature::OntologyTypedI - extends Bio::SeqFeatureI
> and has a method $sf->ontology_term() which returns a
> Bio::Ontology::TermI compliant object.
>
> ie, the synopsis would look like:
>
>
> =head1 NAME
>
> Bio::SeqFeature::OntologyTypedI - a strongly typed SeqFeature
>
> =head1 SYNOPSIS
>
>
>     # get Sequence Features in some manner, eg
>     # from a Sequence object
>
>      foreach $sf ( $seq->get_SeqFeatures() ) {
>          # all sequence features must have primary_tag() return a string
>          $type_as_string = $sf->primary_tag();
>
>          # ontologytyped seqfeatures have an ontology term
>          if( $sf->isa("Bio::SeqFeature::OntologyTypedI") ) {
>              $ot = $sf->ontology_term();
>              print "Ontology identifier:",$ot->identifier()," name:",$ot->name()," Description:",$ot->description(),"\n";
>          } else {
>              print "Sequence Feature does not have an ontology type - tag is $type_as_string\n";
>          }
>
>      }
>
>
> I would then implement this in
>
>     Bio::SeqFeature::OntologyCompliant
>
> which would inheriet its implementation from Bio::SeqFeature::Generic, but
> chain primary_tag to
>
>     $sf->ontology_term()->name();
>
>
> Having done this I don't know how much "magic" I should put into
> SeqIO to automatically promote things into Ontology compliant terms,
> or perhaps we should have a converter - which one can register
> with a SeqIO EMBL or GenBank stream being something like
>
>      $new_sf = $converter->convert($old_sf,$seq);
>
>
>
> This might conflict with an unflattener or something.
>
>
>
> What do people think about this proposal? What else do I need
> to do to tidy this up?
>
>
>
>
>
>
>
>
>
>
>


From ro_phls2 at dh.gov.hk  Wed Aug 10 20:42:11 2005
From: ro_phls2 at dh.gov.hk (Andrew Leung)
Date: Wed Aug 10 20:30:50 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu>
Message-ID: <20050811004135.HPW10378.pimx07@Leungkcro>

Hi Jason,
Thank you for advice. I will try the various approaches suggested. My
ultimate goal is to extract something like this: A267G, Z786-, L898Y etc.
for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature
for annotating mutations is a standardized one. But, it appears that there
no such a ready to use module from Bioperl.
Andrew


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: Wednesday, August 10, 2005 10:36 AM
To: andrew_leung@dh.gov.hk
Cc: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] Extract Mutation Automatically

I guess it comes down to what you want to do with the mutations once  
you've found them.

The seq_inds method in Bio::Search::HSP::HSPI  which is something you  
can call on hsp objects you've gotten out of pairwise alignment  
searches. seq_inds will give you the location of the identical,  
conserved, mismatched columns from a pairwise alignment.  I would  
suggest using FASTA or SSEARCH and

If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'

Here is how I would get the pairwise SW alignment and get the  
mutations out.

If you wanted a global alignment you can use the EMBOSS tool 'needle'  
and generate an MSF alignment which can be parsed with Bio::AlignIO.

some simple code to print out the bases which have mismatches
use Bio::SearchIO;
use strict;
my $fh;
#open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
#my $parser = Bio::SearchIO->new(-format => 'fasta',
#                -fh     => $fh);
my $parser = Bio::SearchIO->new(-format => 'blast',
                                                                - 
fh        => $fh);

if( my $result = $parser->next_result ) { # single result so use if  
instead of while
     if( my $hit = $result->next_hit ) {    # ditto, want single  
result...
     if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would  
need to consider more if using BLAST

         my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
         # if this is protein and you want to treat the conservative  
matches as mismatches
         # you'll need to run the same method but asking for  
'conserved' and then combing the two lists

         for my $base ( @qmismatches ) {
            print "base $base of the hit sequence is a mismatch \n",
        }
     }
     }
}


The Bio::PopGen::Utilities module can also take an alignment and  
extract the positions with variation for use in polymorphism analyses.

-jason

On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:

> Hi all,
> Is there any module available that can allow me to extract mutation(s)
> automatically? The idea is that if I submit two sequences for  
> alignment, the
> script can automatically list out all the differences between the two
> sequences. I wish to know the difference at two levels, i.e. the  
> nucleotide
> and amino acid level. Any ideas?
> Andrew
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Wed Aug 10 21:24:02 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug 10 21:16:28 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <20050811004135.HPW10378.pimx07@Leungkcro>
References: <20050811004135.HPW10378.pimx07@Leungkcro>
Message-ID: <3FD6FD70-7FF7-480B-8E9F-07F2D9C3D207@duke.edu>


On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote:

> Hi Jason,
> Thank you for advice. I will try the various approaches suggested. My
> ultimate goal is to extract something like this: A267G, Z786-,  
> L898Y etc.
> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the  
> nomenclature
> for annotating mutations is a standardized one. But, it appears  
> that there
> no such a ready to use module from Bioperl.

Don't despair, you could be the one to do it!  This would probably  
just a be a subroutine and not necessarily a whole module.

That nomenclature assumes a reference sequence and just getting the  
bases you are interested in.  A few substr or subseq calls and you  
would be right there.

-jason


> Andrew
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: Wednesday, August 10, 2005 10:36 AM
> To: andrew_leung@dh.gov.hk
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
> I guess it comes down to what you want to do with the mutations once
> you've found them.
>
> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> can call on hsp objects you've gotten out of pairwise alignment
> searches. seq_inds will give you the location of the identical,
> conserved, mismatched columns from a pairwise alignment.  I would
> suggest using FASTA or SSEARCH and
>
> If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
>
> Here is how I would get the pairwise SW alignment and get the
> mutations out.
>
> If you wanted a global alignment you can use the EMBOSS tool 'needle'
> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>
> some simple code to print out the bases which have mismatches
> use Bio::SearchIO;
> use strict;
> my $fh;
> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> #my $parser = Bio::SearchIO->new(-format => 'fasta',
> #                -fh     => $fh);
> my $parser = Bio::SearchIO->new(-format => 'blast',
>                                                                 -
> fh        => $fh);
>
> if( my $result = $parser->next_result ) { # single result so use if
> instead of while
>      if( my $hit = $result->next_hit ) {    # ditto, want single
> result...
>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> need to consider more if using BLAST
>
>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>          # if this is protein and you want to treat the conservative
> matches as mismatches
>          # you'll need to run the same method but asking for
> 'conserved' and then combing the two lists
>
>          for my $base ( @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch \n",
>         }
>      }
>      }
> }
>
>
> The Bio::PopGen::Utilities module can also take an alignment and
> extract the positions with variation for use in polymorphism analyses.
>
> -jason
>
> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
>
>
>> Hi all,
>> Is there any module available that can allow me to extract mutation 
>> (s)
>> automatically? The idea is that if I submit two sequences for
>> alignment, the
>> script can automatically list out all the differences between the two
>> sequences. I wish to know the difference at two levels, i.e. the
>> nucleotide
>> and amino acid level. Any ideas?
>> Andrew
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From brian_osborne at cognia.com  Thu Aug 11 10:57:51 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Thu Aug 11 10:55:28 2005
Subject: [Bioperl-l] Re: Feature-Annotation HOWTO
In-Reply-To: <db9f3d7210dd693687e402e54648348e@research.dfci.harvard.edu>
Message-ID: <BF20DFAF.356C%brian_osborne@cognia.com>

Pedro,

First, make sure to write to bioperl-l with questions, there are certainly
people there who know as much, or more, about Features and Annotations as
me.

With regard to references, I believe this bug has been fixed in the latest
Bioperl, bioperl-live.

With regard to the ids ("db_xref"), you'll have to show us what the source
file is and what @ids looks like, I'm afraid I didn't exactly understand the
problem.

With regard to SeqIO, your code looks fine but you've only shown part of it
so I can't be sure. Here's another rendition:

>perl -e 'use Bio::DB::GenBank; $db = new Bio::DB::GenBank; $seq =
$db->get_Seq_by_id(2); use Bio::SeqIO; $out = Bio::SeqIO->new(-fh => \*STDERR,
-format => "fasta"); $out->write_seq($seq);'
>A00002 B.taurus DNA sequence 1 from patent application EP0238993.
AATTCATGCGTCCGGACTTCTGCCTCGAGCCGCCGTACACTGGGCCCTGCAAAGCTCGTA
TCATCCGTTACTTCTACAATGCAAAGGCAGGCCTGTGTCAGACCTTCGTATACGGCGGTT
GCCGTGCTAAGCGTAACAACTTCAAATCCGCGGAAGACTGCGAACGTACTTGCGGTGGTC
CTTAGTAAAGCTTG

Generally speaking, show the entire script as well as any related files so
nothing is left to the imagination.


Brian O.


On 8/11/05 9:42 AM, "Pedro Antonio Reche" <reche@research.dfci.harvard.edu>
wrote:

> Dear Brian,
> I have tried your code from the HOWTO
> 
> 
> my @annotations = $anno_collection->get_Annotations('reference');
> if ($value->tagname eq "reference") {
> my $hash_ref = $value->hash_tree;
> for my $key (keys %{$hash_ref}) {
> print $key,": ",$hash_ref->{$key},"\n";
>  }
> 
> on the gb record attached in this e-mail and I unfortunatelly I am
> unable to get the medline record. I have also tried
> 
> my @annotations = $anno_collection->get_Annotations('reference');
> 
> print "author: ",$value->authors(), "\n";
> print "Title: ",$value->title(), "\n";
> print "Medline: ",$value->medline(), "\n";
> print "PubMed: ",$value->pubmed(), "\n";
> print "Database: ",$value->database(), "\n";
> 
> with the same result. i can not print the medline record. I have also
> find that the code:
> for my $feat_object ($seq_object->get_SeqFeatures) {
>           push @ids,$feat_object->get_tag_values("db_xref")
>                if ($feat_object->has_tag("db_xref"));
>        }
> 
> does not populate  @ids properly with the unique values under
> "db_xreff" but with repeated concatenated values.
> Finally, given that
> 
> $seq_object = $feat_object->entire_seq;
> 
> returns  a  Bio::PrimarySeq I tried to define
> my $out = new Bio::SeqIO(-fh => \*STDERR, -format => 'fasta');
> 
> to print the sequences as
> 
> $out->write_seq($seq_object )
> 
> but it did not work.
> 
> 
> Any help to solve these problem will be apprecitated. I am using
> bioperl 1.4
> Regards,
> pedro


From MEC at Stowers-Institute.org  Thu Aug 11 12:38:06 2005
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Thu Aug 11 12:32:29 2005
Subject: [Bioperl-l] Extract Mutation Automatically
Message-ID: <200508111628.j7BGS4Tv022396@portal.open-bio.org>

re: standardized nomenclature for mutations, see

Recommendations for a nomenclature system for human gene mutations
a copy of which can be found
http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A//mecp2.chw.edu.au
/mecp2/info/mutation_nomenclature_1.pdf&ei=6H37QsvRGo34igGj4vBS&sig2=eYV
DZb467rYOBf-0sCtctQ

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Andrew Leung
Sent: Wednesday, August 10, 2005 7:42 PM
To: 'Jason Stajich'
Cc: bioperl-l@bioperl.org
Subject: RE: [Bioperl-l] Extract Mutation Automatically


Hi Jason,
Thank you for advice. I will try the various approaches suggested. My
ultimate goal is to extract something like this: A267G, Z786-, L898Y
etc.
for aa and A162T, G339A, A388N, etc. for nt. Preferably, the
nomenclature
for annotating mutations is a standardized one. But, it appears that
there
no such a ready to use module from Bioperl.
Andrew


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: Wednesday, August 10, 2005 10:36 AM
To: andrew_leung@dh.gov.hk
Cc: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] Extract Mutation Automatically

I guess it comes down to what you want to do with the mutations once  
you've found them.

The seq_inds method in Bio::Search::HSP::HSPI  which is something you  
can call on hsp objects you've gotten out of pairwise alignment  
searches. seq_inds will give you the location of the identical,  
conserved, mismatched columns from a pairwise alignment.  I would  
suggest using FASTA or SSEARCH and

If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'

Here is how I would get the pairwise SW alignment and get the  
mutations out.

If you wanted a global alignment you can use the EMBOSS tool 'needle'  
and generate an MSF alignment which can be parsed with Bio::AlignIO.

some simple code to print out the bases which have mismatches
use Bio::SearchIO;
use strict;
my $fh;
#open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
#my $parser = Bio::SearchIO->new(-format => 'fasta',
#                -fh     => $fh);
my $parser = Bio::SearchIO->new(-format => 'blast',
                                                                - 
fh        => $fh);

if( my $result = $parser->next_result ) { # single result so use if  
instead of while
     if( my $hit = $result->next_hit ) {    # ditto, want single  
result...
     if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would  
need to consider more if using BLAST

         my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
         # if this is protein and you want to treat the conservative  
matches as mismatches
         # you'll need to run the same method but asking for  
'conserved' and then combing the two lists

         for my $base ( @qmismatches ) {
            print "base $base of the hit sequence is a mismatch \n",
        }
     }
     }
}


The Bio::PopGen::Utilities module can also take an alignment and  
extract the positions with variation for use in polymorphism analyses.

-jason

On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:

> Hi all,
> Is there any module available that can allow me to extract mutation(s)
> automatically? The idea is that if I submit two sequences for  
> alignment, the
> script can automatically list out all the differences between the two
> sequences. I wish to know the difference at two levels, i.e. the  
> nucleotide
> and amino acid level. Any ideas?
> Andrew
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From hlapp at gnf.org  Thu Aug 11 15:52:32 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Aug 11 15:42:37 2005
Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal
In-Reply-To: <Pine.OSX.4.58.0508101353570.26537@skerryvore.dhcp.lbl.gov>
References: <42F9C22D.4030603@ebi.ac.uk>
	<Pine.OSX.4.58.0508101353570.26537@skerryvore.dhcp.lbl.gov>
Message-ID: <644796f4db2eff94029490616b548f48@gnf.org>


On Aug 10, 2005, at 2:03 PM, Chris Mungall wrote:

> Regarding 'magic' in SeqIO - not sure this is required. You can already
> plug in your own factories here, we just need to extend this with 
> feature
> factories. The default method will continue to produce relatively light
> SF::Generics?

Right, this is exactly what I was thinking. A feature factory that 
creates ontology-compliant features will also probably need to have 
something like an OntologyTermResolver, in order to check a given 
feature type (primary_tag) against an ontology that sits somewhere 
(local file, local database, remote database, or even remote file).

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From birney at ebi.ac.uk  Thu Aug 11 17:09:07 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Thu Aug 11 16:59:26 2005
Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal
In-Reply-To: <644796f4db2eff94029490616b548f48@gnf.org>
References: <42F9C22D.4030603@ebi.ac.uk>	<Pine.OSX.4.58.0508101353570.26537@skerryvore.dhcp.lbl.gov>
	<644796f4db2eff94029490616b548f48@gnf.org>
Message-ID: <42FBBE73.5080906@ebi.ac.uk>


Hilmar Lapp wrote:
> 
> On Aug 10, 2005, at 2:03 PM, Chris Mungall wrote:
> 
>> Regarding 'magic' in SeqIO - not sure this is required. You can already
>> plug in your own factories here, we just need to extend this with feature
>> factories. The default method will continue to produce relatively light
>> SF::Generics?
> 
> 
> Right, this is exactly what I was thinking. A feature factory that 
> creates ontology-compliant features will also probably need to have 
> something like an OntologyTermResolver, in order to check a given 
> feature type (primary_tag) against an ontology that sits somewhere 
> (local file, local database, remote database, or even remote file).
> 

Ok - I'll hold off the magic for the moment, but I think it would
be nice to have just-enough of SO in-built into Bioperl so one
could do something like:

   $seqio = Bio::SeqIO->new( -file => "-", -format => 'EMBL', -feature_converter => 'SO');

and the "right thing" happens.


Does anyone want to propose an alt name to

   Bio::SeqFeature::OntologyTypedI?

But for that is it ok for me to implement and commit?


>     -hilmar
From cjm at fruitfly.org  Thu Aug 11 17:29:50 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Thu Aug 11 17:23:35 2005
Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal
In-Reply-To: <42FBBE73.5080906@ebi.ac.uk>
References: <42F9C22D.4030603@ebi.ac.uk>
	<Pine.OSX.4.58.0508101353570.26537@skerryvore.dhcp.lbl.gov>
	<644796f4db2eff94029490616b548f48@gnf.org> <42FBBE73.5080906@ebi.ac.uk>
Message-ID: <Pine.OSX.4.58.0508111425240.29049@skerryvore.dhcp.lbl.gov>


On Thu, 11 Aug 2005, Ewan Birney wrote:

>
>
> Hilmar Lapp wrote:
> >
> > On Aug 10, 2005, at 2:03 PM, Chris Mungall wrote:
> >
> >> Regarding 'magic' in SeqIO - not sure this is required. You can already
> >> plug in your own factories here, we just need to extend this with feature
> >> factories. The default method will continue to produce relatively light
> >> SF::Generics?
> >
> >
> > Right, this is exactly what I was thinking. A feature factory that
> > creates ontology-compliant features will also probably need to have
> > something like an OntologyTermResolver, in order to check a given
> > feature type (primary_tag) against an ontology that sits somewhere
> > (local file, local database, remote database, or even remote file).
> >
>
> Ok - I'll hold off the magic for the moment, but I think it would
> be nice to have just-enough of SO in-built into Bioperl so one
> could do something like:
>
>    $seqio = Bio::SeqIO->new( -file => "-", -format => 'EMBL', -feature_converter => 'SO');
>
> and the "right thing" happens.

actually, Bio::SeqFeature::Tools::TypeMapper already does this. Well, you
still have to wrap it to have the above work, but the mapping is there.

Of course, you can always provide your own mapping as a hash (which could
come from an ontology, a database, whatever). But like you say the gb->SO
type mapping is so common that it's good to have a default hardcoding
here.

> Does anyone want to propose an alt name to
>
>    Bio::SeqFeature::OntologyTypedI?
>
> But for that is it ok for me to implement and commit?
>
>
> >     -hilmar
>


From hlapp at gnf.org  Thu Aug 11 17:43:07 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Aug 11 17:35:28 2005
Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal
In-Reply-To: <42FBBE73.5080906@ebi.ac.uk>
References: <42F9C22D.4030603@ebi.ac.uk>	<Pine.OSX.4.58.0508101353570.26537@skerryvore.dhcp.lbl.gov>
	<644796f4db2eff94029490616b548f48@gnf.org>
	<42FBBE73.5080906@ebi.ac.uk>
Message-ID: <d81b3ae65bd50698f8928c9cd4000b30@gnf.org>


On Aug 11, 2005, at 2:09 PM, Ewan Birney wrote:

> Does anyone want to propose an alt name to
>
>   Bio::SeqFeature::OntologyTypedI?

Frankly I'd just call it Bio::SeqFeature::TypedI, or in the 
unabbreviated style (which I'd personally much prefer) 
Bio::SeqFeature::TypedSeqFeatureI. (We also have Bio::Seq::RichSeqI, 
not Bio::Seq::RichI.)

I.e., the important cue that the name should give is (should be) that 
this is a strongly typed feature. Like Chris mentioned earlier there 
are different ways to achieve typing, but I don't think we will 
eventually want those different ways to be distinct from each other in 
Bioperl - the choice between untyped scruffy and typed tidy should 
suffice.

>
> But for that is it ok for me to implement and commit?

I don't know why I would want to stop you :-)

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From ro_phls2 at dh.gov.hk  Fri Aug 12 03:53:32 2005
From: ro_phls2 at dh.gov.hk (Andrew Leung)
Date: Fri Aug 12 03:42:30 2005
Subject: [Bioperl-l] Return $hit->name by Score Bit value when parsing blast
	result
Message-ID: <20050812075259.CMX1864.pimx07@Leungkcro>

Hi,

I did a StandAloneBlast and this resulted in a blast result object. When I
use obj->next_result and obj->next_hit methods to list the hit name
(hit->name), I found that they are not returned in an order that is similar
to a standard blast result. In a standard blast report, we are familiar with
the fact that hits are ordered by score bit values. With bioperl, how can I
list the hits by score bits? Shall I manually extract all the hits' score
bit and then do a hash sorting? Or, they are a better way to achieve it.

Andrew

From ro_phls2 at dh.gov.hk  Fri Aug 12 06:50:04 2005
From: ro_phls2 at dh.gov.hk (Andrew Leung)
Date: Fri Aug 12 06:39:05 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <3FD6FD70-7FF7-480B-8E9F-07F2D9C3D207@duke.edu>
Message-ID: <20050812104931.DJB1864.pimx07@Leungkcro>

Hi Jason,
I have tired the the seq_inds method in Bio::Search::HSP::HSPI. But, other
than identical and conserved, there is no "mismatched" option.

http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/HSPI.html#POD15

I am still thinking of how to get the mismatch details. Working from
identical/conserved seq_inds values seems to be very complicated.
Andrew 

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: Thursday, August 11, 2005 9:24 AM
To: andrew_leung@dh.gov.hk
Cc: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] Extract Mutation Automatically


On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote:

> Hi Jason,
> Thank you for advice. I will try the various approaches suggested. My
> ultimate goal is to extract something like this: A267G, Z786-,  
> L898Y etc.
> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the  
> nomenclature
> for annotating mutations is a standardized one. But, it appears  
> that there
> no such a ready to use module from Bioperl.

Don't despair, you could be the one to do it!  This would probably  
just a be a subroutine and not necessarily a whole module.

That nomenclature assumes a reference sequence and just getting the  
bases you are interested in.  A few substr or subseq calls and you  
would be right there.

-jason


> Andrew
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: Wednesday, August 10, 2005 10:36 AM
> To: andrew_leung@dh.gov.hk
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
> I guess it comes down to what you want to do with the mutations once
> you've found them.
>
> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> can call on hsp objects you've gotten out of pairwise alignment
> searches. seq_inds will give you the location of the identical,
> conserved, mismatched columns from a pairwise alignment.  I would
> suggest using FASTA or SSEARCH and
>
> If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
>
> Here is how I would get the pairwise SW alignment and get the
> mutations out.
>
> If you wanted a global alignment you can use the EMBOSS tool 'needle'
> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>
> some simple code to print out the bases which have mismatches
> use Bio::SearchIO;
> use strict;
> my $fh;
> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> #my $parser = Bio::SearchIO->new(-format => 'fasta',
> #                -fh     => $fh);
> my $parser = Bio::SearchIO->new(-format => 'blast',
>                                                                 -
> fh        => $fh);
>
> if( my $result = $parser->next_result ) { # single result so use if
> instead of while
>      if( my $hit = $result->next_hit ) {    # ditto, want single
> result...
>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> need to consider more if using BLAST
>
>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>          # if this is protein and you want to treat the conservative
> matches as mismatches
>          # you'll need to run the same method but asking for
> 'conserved' and then combing the two lists
>
>          for my $base ( @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch \n",
>         }
>      }
>      }
> }
>
>
> The Bio::PopGen::Utilities module can also take an alignment and
> extract the positions with variation for use in polymorphism analyses.
>
> -jason
>
> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
>
>
>> Hi all,
>> Is there any module available that can allow me to extract mutation 
>> (s)
>> automatically? The idea is that if I submit two sequences for
>> alignment, the
>> script can automatically list out all the differences between the two
>> sequences. I wish to know the difference at two levels, i.e. the
>> nucleotide
>> and amino acid level. Any ideas?
>> Andrew
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Fri Aug 12 07:58:31 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Aug 12 07:50:51 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <20050812104931.DJB1864.pimx07@Leungkcro>
References: <20050812104931.DJB1864.pimx07@Leungkcro>
Message-ID: <6758370F-0C05-4AE5-A4DC-30F9013C10AB@duke.edu>

'nomatch'

On Aug 12, 2005, at 6:50 AM, Andrew Leung wrote:

> Hi Jason,
> I have tired the the seq_inds method in Bio::Search::HSP::HSPI.  
> But, other
> than identical and conserved, there is no "mismatched" option.
>
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/ 
> HSPI.html#POD15
>
> I am still thinking of how to get the mismatch details. Working from
> identical/conserved seq_inds values seems to be very complicated.
> Andrew
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: Thursday, August 11, 2005 9:24 AM
> To: andrew_leung@dh.gov.hk
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
>
> On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote:
>
>
>> Hi Jason,
>> Thank you for advice. I will try the various approaches suggested. My
>> ultimate goal is to extract something like this: A267G, Z786-,
>> L898Y etc.
>> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the
>> nomenclature
>> for annotating mutations is a standardized one. But, it appears
>> that there
>> no such a ready to use module from Bioperl.
>>
>
> Don't despair, you could be the one to do it!  This would probably
> just a be a subroutine and not necessarily a whole module.
>
> That nomenclature assumes a reference sequence and just getting the
> bases you are interested in.  A few substr or subseq calls and you
> would be right there.
>
> -jason
>
>
>
>> Andrew
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich@duke.edu]
>> Sent: Wednesday, August 10, 2005 10:36 AM
>> To: andrew_leung@dh.gov.hk
>> Cc: bioperl-l@bioperl.org
>> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>>
>> I guess it comes down to what you want to do with the mutations once
>> you've found them.
>>
>> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
>> can call on hsp objects you've gotten out of pairwise alignment
>> searches. seq_inds will give you the location of the identical,
>> conserved, mismatched columns from a pairwise alignment.  I would
>> suggest using FASTA or SSEARCH and
>>
>> If you had two files with seqs to align called 'seq1.fa' and  
>> 'seq2.fa'
>>
>> Here is how I would get the pairwise SW alignment and get the
>> mutations out.
>>
>> If you wanted a global alignment you can use the EMBOSS tool 'needle'
>> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>>
>> some simple code to print out the bases which have mismatches
>> use Bio::SearchIO;
>> use strict;
>> my $fh;
>> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
>> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
>> #my $parser = Bio::SearchIO->new(-format => 'fasta',
>> #                -fh     => $fh);
>> my $parser = Bio::SearchIO->new(-format => 'blast',
>>                                                                 -
>> fh        => $fh);
>>
>> if( my $result = $parser->next_result ) { # single result so use if
>> instead of while
>>      if( my $hit = $result->next_hit ) {    # ditto, want single
>> result...
>>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
>> need to consider more if using BLAST
>>
>>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>>          # if this is protein and you want to treat the conservative
>> matches as mismatches
>>          # you'll need to run the same method but asking for
>> 'conserved' and then combing the two lists
>>
>>          for my $base ( @qmismatches ) {
>>             print "base $base of the hit sequence is a mismatch \n",
>>         }
>>      }
>>      }
>> }
>>
>>
>> The Bio::PopGen::Utilities module can also take an alignment and
>> extract the positions with variation for use in polymorphism  
>> analyses.
>>
>> -jason
>>
>> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
>>
>>
>>
>>> Hi all,
>>> Is there any module available that can allow me to extract mutation
>>> (s)
>>> automatically? The idea is that if I submit two sequences for
>>> alignment, the
>>> script can automatically list out all the differences between the  
>>> two
>>> sequences. I wish to know the difference at two levels, i.e. the
>>> nucleotide
>>> and amino acid level. Any ideas?
>>> Andrew
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From jason.stajich at duke.edu  Fri Aug 12 08:06:07 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Aug 12 07:56:21 2005
Subject: [Bioperl-l] Return $hit->name by Score Bit value when parsing
	blast result
In-Reply-To: <20050812075259.CMX1864.pimx07@Leungkcro>
References: <20050812075259.CMX1864.pimx07@Leungkcro>
Message-ID: <13D5095C-7906-416A-B733-B0494B94AF6D@duke.edu>

They are supposed to be returned in the order they are found in the  
report -- although I remember there may be something inconsistent  
with the code added to handle PSIBlast parsing too.  I've not yet  
investigated this so so I don't know whether or not it is a bug.


At any rate, you can always collect all the Hits into an array an  
sort them:

my @hits = $result->hits;
for my $hit ( sort { $a->bits <=> $b->bits } @hits ) {

}

If you read the documentation for Bio::Search::Result::ResultI you'll  
see a 'sort_hits' function which should also allow you to provide a  
sorting function to control the order of the hits.

-jason
On Aug 12, 2005, at 3:53 AM, Andrew Leung wrote:

> Hi,
>
> I did a StandAloneBlast and this resulted in a blast result object.  
> When I
> use obj->next_result and obj->next_hit methods to list the hit name
> (hit->name), I found that they are not returned in an order that is  
> similar
> to a standard blast result. In a standard blast report, we are  
> familiar with
> the fact that hits are ordered by score bit values. With bioperl,  
> how can I
> list the hits by score bits? Shall I manually extract all the hits'  
> score
> bit and then do a hash sorting? Or, they are a better way to  
> achieve it.
>
> Andrew
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From markus.riester at student.uni-tuebingen.de  Fri Aug 12 14:21:55 2005
From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de)
Date: Fri Aug 12 08:23:27 2005
Subject: [Bioperl-l] where to discuss namespaces for modules?
Message-ID: <twig.1123849315.19435@mail.uni-tuebingen.de>

hi,

sorry for writting again. Is this the right place to discuss namespaces?

http://www.weigelworld.org/resources/software/perl_modules/

Weigel::Search was only a temporary namespace and I think it is not very good
when we upload this to cpan with this namespace. Maybe Bio::Search? Or
Bio::Pat(tern)Search? Would be very nice to hear some feedback from you!   


Best regards,
Markus
From ram at i122server.vu-wien.ac.at  Fri Aug 12 04:51:38 2005
From: ram at i122server.vu-wien.ac.at (Rambabu Gudavalli)
Date: Fri Aug 12 09:14:41 2005
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 28, Issue 6
In-Reply-To: <200508112105.j7BL19Tx026719@portal.open-bio.org>
References: <200508112105.j7BL19Tx026719@portal.open-bio.org>
Message-ID: <22b738fa01916301dcc6c48b289277e1@i122server.vu-wien.ac.at>

Dear all,
    i have question that, how can i download  the popset file by using  
the bioperl. i know the id  [gi:22724863]
i can do it manually, but need more files, so i wanna do it by using  
bioperl.

here is the URL for one file that i need to download.

http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi? 
db=popset&view=ps&val=22724863

thank you,
Ram
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 436 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050812/c0f06520/attachment.bin
From ram at i122server.vu-wien.ac.at  Fri Aug 12 05:09:59 2005
From: ram at i122server.vu-wien.ac.at (Rambabu Gudavalli)
Date: Fri Aug 12 09:14:48 2005
Subject: [Bioperl-l] download popset file using bioperl
Message-ID: <515c08d6d1253e609971d2a84a47ec25@i122server.vu-wien.ac.at>

Dear all,
    i have question that, how can i download  the popset file by using  
the bioperl. i know the id  [gi:22724863]
i can do it manually, but need more files, so i wanna do it by using  
bioperl.

here is the URL for one file that i need to download.

http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi? 
db=popset&view=ps&val=22724863

thank you,
Ram
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 436 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050812/4b44a249/attachment.bin
From nel at birc.dk  Sun Aug 14 23:37:55 2005
From: nel at birc.dk (Niels Larsen)
Date: Sun Aug 14 23:28:03 2005
Subject: [Bioperl-l] get_Seq_by_id question
In-Reply-To: <1123119449.11338.3.camel@bacp4>
References: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>
	<1123119449.11338.3.camel@bacp4>
Message-ID: <1124077075.43000e1358638@webmail.daimi.au.dk>

Greetings,

When I do

require Bio::DB::EMBL;

$embl = new Bio::DB::EMBL();
$entry = $embl->get_Seq_by_id( "AF222686" );

Then I get one entry, EMBL:AY883858. Am I doing something wrong?
get_Seq_by_acc returns the same. That entry AY883858, btw, is the
first in the list one gets when searching with "AF222686" at the EBI
front page (http://www.ebi.ac.uk).

Niels L


From heikki at ebi.ac.uk  Mon Aug 15 06:41:39 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Mon Aug 15 06:36:26 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <20050811004135.HPW10378.pimx07@Leungkcro>
References: <20050811004135.HPW10378.pimx07@Leungkcro>
Message-ID: <200508151141.39713.heikki@ebi.ac.uk>


Andrew,

Once you have extracted the information, you can create Bio::Variation objects 
which know how to stringify the description according to human mutation 
nomenclature rules.

In practise, you create a Bio::Variation::SeqDiff object, add to it the 
appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and call 
methods sysname() for nucleotides descriptor or trivname() for amino acid 
descriptor.

The nomenclature used is not the most recent complex suggestion from den 
Dunnen et al but original (and in basic cases identical) from Antonorakis et 
al.

 -Heikki


On Thursday 11 August 2005 01:42, Andrew Leung wrote:
> Hi Jason,
> Thank you for advice. I will try the various approaches suggested. My
> ultimate goal is to extract something like this: A267G, Z786-, L898Y etc.
> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature
> for annotating mutations is a standardized one. But, it appears that there
> no such a ready to use module from Bioperl.
> Andrew
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: Wednesday, August 10, 2005 10:36 AM
> To: andrew_leung@dh.gov.hk
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
> I guess it comes down to what you want to do with the mutations once
> you've found them.
>
> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> can call on hsp objects you've gotten out of pairwise alignment
> searches. seq_inds will give you the location of the identical,
> conserved, mismatched columns from a pairwise alignment.  I would
> suggest using FASTA or SSEARCH and
>
> If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
>
> Here is how I would get the pairwise SW alignment and get the
> mutations out.
>
> If you wanted a global alignment you can use the EMBOSS tool 'needle'
> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>
> some simple code to print out the bases which have mismatches
> use Bio::SearchIO;
> use strict;
> my $fh;
> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> #my $parser = Bio::SearchIO->new(-format => 'fasta',
> #                -fh     => $fh);
> my $parser = Bio::SearchIO->new(-format => 'blast',
>                                                                 -
> fh        => $fh);
>
> if( my $result = $parser->next_result ) { # single result so use if
> instead of while
>      if( my $hit = $result->next_hit ) {    # ditto, want single
> result...
>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> need to consider more if using BLAST
>
>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>          # if this is protein and you want to treat the conservative
> matches as mismatches
>          # you'll need to run the same method but asking for
> 'conserved' and then combing the two lists
>
>          for my $base ( @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch \n",
>         }
>      }
>      }
> }
>
>
> The Bio::PopGen::Utilities module can also take an alignment and
> extract the positions with variation for use in polymorphism analyses.
>
> -jason
>
> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
> > Hi all,
> > Is there any module available that can allow me to extract mutation(s)
> > automatically? The idea is that if I submit two sequences for
> > alignment, the
> > script can automatically list out all the differences between the two
> > sequences. I wish to know the difference at two levels, i.e. the
> > nucleotide
> > and amino acid level. Any ideas?
> > Andrew
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From heikki at ebi.ac.uk  Mon Aug 15 07:39:19 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Mon Aug 15 07:32:08 2005
Subject: [Bioperl-l] get_Seq_by_id question
In-Reply-To: <1124077075.43000e1358638@webmail.daimi.au.dk>
References: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>
	<1123119449.11338.3.camel@bacp4>
	<1124077075.43000e1358638@webmail.daimi.au.dk>
Message-ID: <200508151239.19410.heikki@ebi.ac.uk>


Niels,

There is something funny going on with the underlying SRS engine. I'll get to 
the bottom it and report back.

 -Heikki

On Monday 15 August 2005 04:37, Niels Larsen wrote:
> Greetings,
>
> When I do
>
> require Bio::DB::EMBL;
>
> $embl = new Bio::DB::EMBL();
> $entry = $embl->get_Seq_by_id( "AF222686" );
>
> Then I get one entry, EMBL:AY883858. Am I doing something wrong?
> get_Seq_by_acc returns the same. That entry AY883858, btw, is the
> first in the list one gets when searching with "AF222686" at the EBI
> front page (http://www.ebi.ac.uk).
>
> Niels L
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From heikki at ebi.ac.uk  Mon Aug 15 09:22:49 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Mon Aug 15 09:18:01 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-fi les
In-Reply-To: <339D68B133EAD311971E009027DC47970321B47E@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC47970321B47E@montecarlo.cgr.harvard.edu>
Message-ID: <200508151422.49280.heikki@ebi.ac.uk>

On Tuesday 09 August 2005 20:20, Amir Karger wrote:
>
> I wrote a simple one-liner to convert fasta to three, tab-separated
> columns: ID (without '>') desc, and concatenated sequence. That way you
> don't have to worry about keeping the two files tied together, but agrep
> should still find things only in the concatenated sequence. (Unless
> somebody mean put a sequence into the description column.) As an added
> bonus, it means you can throw a FASTA into Excel for sorting, filtering,
> etc. Or merge with a gene list pretty easily.
> It's at
> http://cgr.harvard.edu/cbg/scriptome/Tools/Change.html#new__change_a_fasta_
>f ile_into_tabular_format__change_fasta_to_tab_
> along with the tab-to-FASTA converter, along with a couple sentences
> describing potential gotchas (e.g., any tabs in the desc get lost)
>


Amir,

FYI, this is  already implemented as 'tab' format in Bio::SeqIO.

 -Heikki
From cain at cshl.edu  Mon Aug 15 10:35:10 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug 15 10:25:17 2005
Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta?
Message-ID: <1124116511.2891.9.camel@localhost.localdomain>

Hello all,

I am investigating a bug in GBrowse that seems to only surface when
people are using the memory (ie, file) adaptor on Windows systems.
Here's the bug report:

https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&group_id=27707

I've tracked the problem down to Bio::DB::Fasta when the file is dos
formatted (that is, it has both line feeds and carriage returns), BDF
returns the wrong string when a subsequence is requested, but when the
file is unix formatted (ie only CR (or is it only LF?)), it returns the
right string.  I wrote the very simple test script below and stepped it
through the perl debugger.  It looks like the bug is in the caloffset
method, as it returns the same offsets regardless of the file type,
which then makes the subsequent seek into the file go to the wrong
coordinates of dos formatted files.

Unfortunately, I don't really know what is going on caloffset, so I
don't know how to fix it, but it presumably has to check the format of
the file somewhere and take that into account.

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From heikki at ebi.ac.uk  Mon Aug 15 11:00:42 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Mon Aug 15 10:50:14 2005
Subject: [Bioperl-l] get_Seq_by_id question
In-Reply-To: <200508151239.19410.heikki@ebi.ac.uk>
References: <Pine.LNX.4.58.0508031754310.17629@sumo.ctrl.ucla.edu>
	<1124077075.43000e1358638@webmail.daimi.au.dk>
	<200508151239.19410.heikki@ebi.ac.uk>
Message-ID: <200508151600.42365.heikki@ebi.ac.uk>

Ok. This should be fixed soon.

A couple of releases ago EMBL introduced accession number ranges for 
situations where there is a long list of secondary accession numbers it 
(GenBank has used them a bit longer), e.g.

AC   AY883861; AF333345-AF333346; AH010225;

The code that expanded this range was broken in the EBI SRS server. It was 
fixed yesterday, but with the huge size if the database it takes a while to 
propagate the fix into the public server.

Yours,
 -Heikki

On Monday 15 August 2005 12:39, Heikki Lehvaslaiho wrote:
> Niels,
>
> There is something funny going on with the underlying SRS engine. I'll get
> to the bottom it and report back.
>
>  -Heikki
>
> On Monday 15 August 2005 04:37, Niels Larsen wrote:
> > Greetings,
> >
> > When I do
> >
> > require Bio::DB::EMBL;
> >
> > $embl = new Bio::DB::EMBL();
> > $entry = $embl->get_Seq_by_id( "AF222686" );
> >
> > Then I get one entry, EMBL:AY883858. Am I doing something wrong?
> > get_Seq_by_acc returns the same. That entry AY883858, btw, is the
> > first in the list one gets when searching with "AF222686" at the EBI
> > front page (http://www.ebi.ac.uk).
> >
> > Niels L
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From golharam at umdnj.edu  Mon Aug 15 11:38:46 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon Aug 15 11:29:45 2005
Subject: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle
Message-ID: <011301c5a1af$6cf54810$2f01a8c0@GOLHARMOBILE1>

EMBOSS needle reports percent identity and percent similarity, however
Bio::Align::AlignI has no method for obtain the percent similarity.  

My code is essentially:

my $in = new Bio::AlignIO(-format => 'emboss', -fh => new
IO::String($output));
my $aln = $in->next_aln;
$fepct = $aln->overall_percentage_identity;

I tried the different percentage_identity methods to see if any of them
work, but they don't give the similarity number.  Is there a way to get
the percent similarity through bioperl?

Also, the description part of the document for
overall_percentage_identity has a type for the Title.

Ryan

From jason.stajich at duke.edu  Mon Aug 15 12:04:24 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug 15 11:55:32 2005
Subject: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle
In-Reply-To: <011301c5a1af$6cf54810$2f01a8c0@GOLHARMOBILE1>
References: <011301c5a1af$6cf54810$2f01a8c0@GOLHARMOBILE1>
Message-ID: <5B6F1C45-4B48-40AC-BC0A-BF50173DD40E@duke.edu>

You could sort of figure it out by processing the match_line output  
and counting the number of ':", '.', and '*' items and dividing by  
the aln length.

If we did parse it - there isn't really anywhere to put that sort of  
field right now in SimpleAlign.

I generally just have simple perl parser I run on needle/water output  
to get the percent similar/identical stats and if I need the  
alignment then parse it again with AlignIO;

Something like:
my %stats;
while(<$io>) {
  if(/^\#\s+(Identity|Similarity|Gaps):\s+(\d+)\/(\d+)\s+\(\s*(\d+\.\d 
+)\s*%\s*\)/ ) {
   $stats{$1} = [$2,$3,$4];
  }
}

$io->seek(0);
# process with AlignIO....


-jason

On Aug 15, 2005, at 11:38 AM, Ryan Golhar wrote:

> EMBOSS needle reports percent identity and percent similarity, however
> Bio::Align::AlignI has no method for obtain the percent similarity.
>
> My code is essentially:
>
> my $in = new Bio::AlignIO(-format => 'emboss', -fh => new
> IO::String($output));
> my $aln = $in->next_aln;
> $fepct = $aln->overall_percentage_identity;
>
> I tried the different percentage_identity methods to see if any of  
> them
> work, but they don't give the similarity number.  Is there a way to  
> get
> the percent similarity through bioperl?
>
> Also, the description part of the document for
> overall_percentage_identity has a type for the Title.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cain at cshl.edu  Mon Aug 15 13:22:29 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug 15 13:12:52 2005
Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta?
In-Reply-To: <1124116511.2891.9.camel@localhost.localdomain>
References: <1124116511.2891.9.camel@localhost.localdomain>
Message-ID: <1124126549.2868.2.camel@localhost.localdomain>

Just to follow up on my own email with a little more information: in
Fasta.pm, line 697:

  $termination_length ||= /\r\n$/ ? 2 : 1;  # account for crlf-terminated Windows files

The pattern match is failing on DOS formatted files; I don't know why.
Does anyone else?


On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote:
> Hello all,
> 
> I am investigating a bug in GBrowse that seems to only surface when
> people are using the memory (ie, file) adaptor on Windows systems.
> Here's the bug report:
> 
> https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&group_id=27707
> 
> I've tracked the problem down to Bio::DB::Fasta when the file is dos
> formatted (that is, it has both line feeds and carriage returns), BDF
> returns the wrong string when a subsequence is requested, but when the
> file is unix formatted (ie only CR (or is it only LF?)), it returns the
> right string.  I wrote the very simple test script below and stepped it
> through the perl debugger.  It looks like the bug is in the caloffset
> method, as it returns the same offsets regardless of the file type,
> which then makes the subsequent seek into the file go to the wrong
> coordinates of dos formatted files.
> 
> Unfortunately, I don't really know what is going on caloffset, so I
> don't know how to fix it, but it presumably has to check the format of
> the file somewhere and take that into account.
> 
> Thanks,
> Scott
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From golharam at umdnj.edu  Mon Aug 15 13:59:22 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon Aug 15 13:49:03 2005
Subject: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle
In-Reply-To: <5B6F1C45-4B48-40AC-BC0A-BF50173DD40E@duke.edu>
Message-ID: <013301c5a1c3$10c6b600$2f01a8c0@GOLHARMOBILE1>

That's exactly what I'm doing now....just regex parsing the similarity
line...didn't know if it was built into bioperl and I was just missing
it...

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: Monday, August 15, 2005 12:04 PM
To: golharam@umdnj.edu
Cc: 'Bioperl List'
Subject: Re: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle


You could sort of figure it out by processing the match_line output  
and counting the number of ':", '.', and '*' items and dividing by  
the aln length.

If we did parse it - there isn't really anywhere to put that sort of  
field right now in SimpleAlign.

I generally just have simple perl parser I run on needle/water output  
to get the percent similar/identical stats and if I need the  
alignment then parse it again with AlignIO;

Something like:
my %stats;
while(<$io>) {
  if(/^\#\s+(Identity|Similarity|Gaps):\s+(\d+)\/(\d+)\s+\(\s*(\d+\.\d 
+)\s*%\s*\)/ ) {
   $stats{$1} = [$2,$3,$4];
  }
}

$io->seek(0);
# process with AlignIO....


-jason

On Aug 15, 2005, at 11:38 AM, Ryan Golhar wrote:

> EMBOSS needle reports percent identity and percent similarity, however

> Bio::Align::AlignI has no method for obtain the percent similarity.
>
> My code is essentially:
>
> my $in = new Bio::AlignIO(-format => 'emboss', -fh => new 
> IO::String($output)); my $aln = $in->next_aln;
> $fepct = $aln->overall_percentage_identity;
>
> I tried the different percentage_identity methods to see if any of
> them
> work, but they don't give the similarity number.  Is there a way to  
> get
> the percent similarity through bioperl?
>
> Also, the description part of the document for 
> overall_percentage_identity has a type for the Title.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12

From jason.stajich at duke.edu  Mon Aug 15 18:11:45 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug 15 18:02:54 2005
Subject: [Bioperl-l] GuessSeqFormat problems
Message-ID: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>


Albert -

I think the new guessing changes for phylip are causing havoc.   Lots  
of tests are failing t/GuessSeqFeature.t.  Can you take a look?

I was looking over this module - it seems like we probably want to  
run the tests in a particular order as some matches are ambiguous and  
we probably need to have preferred order. At least we'll know when  
something fails, what the order.

Another thing is it uses open directly instead of allowing Root::IO  
to open a filehandle.  If went to using Root::IO, it would allow  
peeking at not only a file but a filehandle/stream and then use  
_pushback after we have peeked over the first few lines, guess the  
format, then pass it along to the SeqIO/AlignIO handle appropriately.

Anyways, just thoughts...

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From ro_phls2 at dh.gov.hk  Mon Aug 15 20:49:52 2005
From: ro_phls2 at dh.gov.hk (Andrew Leung)
Date: Mon Aug 15 20:38:47 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <200508151141.39713.heikki@ebi.ac.uk>
Message-ID: <20050816004925.KUF1864.pimx07@Leungkcro>

Hi Heikki,
Thank you for your note.
I now have two strands of sequences obtained from a hsp and an array of
mutation position information resulted from seq_inds() with 'mismatch'
option. Do you mean that I can put these data to Bio::Variation and generate
a mutation list as desired? I am quite new to Bioperl. Can you explain in
greater details? I've read the documentation for Bio::Variation, but it
appears to me that its methods are mainly for "set", but not for "reading"
mutation.
Andrew


= = = = = = = = = = 
Andrew,

Once you have extracted the information, you can create Bio::Variation
objects 
which know how to stringify the description according to human mutation 
nomenclature rules.

In practise, you create a Bio::Variation::SeqDiff object, add to it the 
appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and
call 
methods sysname() for nucleotides descriptor or trivname() for amino acid 
descriptor.

The nomenclature used is not the most recent complex suggestion from den 
Dunnen et al but original (and in basic cases identical) from Antonorakis et

al.

 -Heikki


On Thursday 11 August 2005 01:42, Andrew Leung wrote:
> Hi Jason,
> Thank you for advice. I will try the various approaches suggested. My
> ultimate goal is to extract something like this: A267G, Z786-, L898Y etc.
> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature
> for annotating mutations is a standardized one. But, it appears that there
> no such a ready to use module from Bioperl.
> Andrew
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: Wednesday, August 10, 2005 10:36 AM
> To: andrew_leung@dh.gov.hk
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
> I guess it comes down to what you want to do with the mutations once
> you've found them.
>
> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> can call on hsp objects you've gotten out of pairwise alignment
> searches. seq_inds will give you the location of the identical,
> conserved, mismatched columns from a pairwise alignment.  I would
> suggest using FASTA or SSEARCH and
>
> If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
>
> Here is how I would get the pairwise SW alignment and get the
> mutations out.
>
> If you wanted a global alignment you can use the EMBOSS tool 'needle'
> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>
> some simple code to print out the bases which have mismatches
> use Bio::SearchIO;
> use strict;
> my $fh;
> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> #my $parser = Bio::SearchIO->new(-format => 'fasta',
> #                -fh     => $fh);
> my $parser = Bio::SearchIO->new(-format => 'blast',
>                                                                 -
> fh        => $fh);
>
> if( my $result = $parser->next_result ) { # single result so use if
> instead of while
>      if( my $hit = $result->next_hit ) {    # ditto, want single
> result...
>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> need to consider more if using BLAST
>
>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>          # if this is protein and you want to treat the conservative
> matches as mismatches
>          # you'll need to run the same method but asking for
> 'conserved' and then combing the two lists
>
>          for my $base ( @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch \n",
>         }
>      }
>      }
> }
>
>
> The Bio::PopGen::Utilities module can also take an alignment and
> extract the positions with variation for use in polymorphism analyses.
>
> -jason
>
> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
> > Hi all,
> > Is there any module available that can allow me to extract mutation(s)
> > automatically? The idea is that if I submit two sequences for
> > alignment, the
> > script can automatically list out all the differences between the two
> > sequences. I wish to know the difference at two levels, i.e. the
> > nucleotide
> > and amino acid level. Any ideas?
> > Andrew
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From avilella at gmail.com  Tue Aug 16 04:50:03 2005
From: avilella at gmail.com (Albert Vilella)
Date: Tue Aug 16 04:40:28 2005
Subject: [Bioperl-l] Re: GuessSeqFormat problems
In-Reply-To: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
Message-ID: <1124182203.8208.14.camel@localhost.localdomain>

El dl 15 de 08 del 2005 a les 18:11 -0400, en/na Jason Stajich va
escriure:
> 
> 
> Albert - 
> 
> 
> I think the new guessing changes for phylip are causing havoc.   Lots
> of tests are failing t/GuessSeqFeature.t.  Can you take a look?

Uops, sorry about that.

I was trying to make the match for phylip more generic in $lineno=2. In
my case it was returning an unexistent Bio::AlignIO::pir.

I have fixed it and now passes all the tests.

> 
> 
> I was looking over this module - it seems like we probably want to run
> the tests in a particular order as some matches are ambiguous and we
> probably need to have preferred order. At least we'll know when
> something fails, what the order.

As I understand from the DESCRIPTION, the more lines one checks, the
better is determined, isn't it?

Maybe it would help to add more line checks in some of the formats, that
are loosely constricted in their first lines.


> Another thing is it uses open directly instead of allowing Root::IO to
> open a filehandle.  If went to using Root::IO, it would allow peeking
> at not only a file but a filehandle/stream and then use _pushback
> after we have peeked over the first few lines, guess the format, then
> pass it along to the SeqIO/AlignIO handle appropriately.
> 
> 
> Anyways, just thoughts...
> 
> 
> -jason
> --
> 
> Jason Stajich
> 
> jason.stajich at duke.edu
> 
> http://www.duke.edu/~jes12/
> 
> 
> 
> 
> 

From heikki at ebi.ac.uk  Tue Aug 16 05:06:07 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Tue Aug 16 04:56:31 2005
Subject: [Bioperl-l] Extract Mutation Automatically
In-Reply-To: <20050816004925.KUF1864.pimx07@Leungkcro>
References: <20050816004925.KUF1864.pimx07@Leungkcro>
Message-ID: <200508161006.08243.heikki@ebi.ac.uk>

Andrew,

You are right, Bio::Variation objects only store and format the findings.
This same question popped up a couple of months ago. See:

http://portal.open-bio.org/pipermail/bioperl-l/2005-June/019242.html

I wonder if Julio got round to writing the code?

 -Heikki

On Tuesday 16 August 2005 01:49, Andrew Leung wrote:
> Hi Heikki,
> Thank you for your note.
> I now have two strands of sequences obtained from a hsp and an array of
> mutation position information resulted from seq_inds() with 'mismatch'
> option. Do you mean that I can put these data to Bio::Variation and
> generate a mutation list as desired? I am quite new to Bioperl. Can you
> explain in greater details? I've read the documentation for Bio::Variation,
> but it appears to me that its methods are mainly for "set", but not for
> "reading" mutation.
> Andrew
>
>
>
> = = = = = = = = = =
> Andrew,
>
> Once you have extracted the information, you can create Bio::Variation
> objects
> which know how to stringify the description according to human mutation
> nomenclature rules.
>
> In practise, you create a Bio::Variation::SeqDiff object, add to it the
> appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and
> call
> methods sysname() for nucleotides descriptor or trivname() for amino acid
> descriptor.
>
> The nomenclature used is not the most recent complex suggestion from den
> Dunnen et al but original (and in basic cases identical) from Antonorakis
> et
>
> al.
>
>  -Heikki
>
> On Thursday 11 August 2005 01:42, Andrew Leung wrote:
> > Hi Jason,
> > Thank you for advice. I will try the various approaches suggested. My
> > ultimate goal is to extract something like this: A267G, Z786-, L898Y etc.
> > for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature
> > for annotating mutations is a standardized one. But, it appears that
> > there no such a ready to use module from Bioperl.
> > Andrew
> >
> >
> > -----Original Message-----
> > From: Jason Stajich [mailto:jason.stajich@duke.edu]
> > Sent: Wednesday, August 10, 2005 10:36 AM
> > To: andrew_leung@dh.gov.hk
> > Cc: bioperl-l@bioperl.org
> > Subject: Re: [Bioperl-l] Extract Mutation Automatically
> >
> > I guess it comes down to what you want to do with the mutations once
> > you've found them.
> >
> > The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> > can call on hsp objects you've gotten out of pairwise alignment
> > searches. seq_inds will give you the location of the identical,
> > conserved, mismatched columns from a pairwise alignment.  I would
> > suggest using FASTA or SSEARCH and
> >
> > If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
> >
> > Here is how I would get the pairwise SW alignment and get the
> > mutations out.
> >
> > If you wanted a global alignment you can use the EMBOSS tool 'needle'
> > and generate an MSF alignment which can be parsed with Bio::AlignIO.
> >
> > some simple code to print out the bases which have mismatches
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> > open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> > #my $parser = Bio::SearchIO->new(-format => 'fasta',
> > #                -fh     => $fh);
> > my $parser = Bio::SearchIO->new(-format => 'blast',
> >                                                                 -
> > fh        => $fh);
> >
> > if( my $result = $parser->next_result ) { # single result so use if
> > instead of while
> >      if( my $hit = $result->next_hit ) {    # ditto, want single
> > result...
> >      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> > need to consider more if using BLAST
> >
> >          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
> >          # if this is protein and you want to treat the conservative
> > matches as mismatches
> >          # you'll need to run the same method but asking for
> > 'conserved' and then combing the two lists
> >
> >          for my $base ( @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch \n",
> >         }
> >      }
> >      }
> > }
> >
> >
> > The Bio::PopGen::Utilities module can also take an alignment and
> > extract the positions with variation for use in polymorphism analyses.
> >
> > -jason
> >
> > On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
> > > Hi all,
> > > Is there any module available that can allow me to extract mutation(s)
> > > automatically? The idea is that if I submit two sequences for
> > > alignment, the
> > > script can automatically list out all the differences between the two
> > > sequences. I wish to know the difference at two levels, i.e. the
> > > nucleotide
> > > and amino acid level. Any ideas?
> > > Andrew
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From andreas.kahari at ebi.ac.uk  Tue Aug 16 05:42:20 2005
From: andreas.kahari at ebi.ac.uk (Andreas Kahari)
Date: Tue Aug 16 05:33:22 2005
Subject: [Bioperl-l] Re: GuessSeqFormat problems
In-Reply-To: <1124182203.8208.14.camel@localhost.localdomain>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<1124182203.8208.14.camel@localhost.localdomain>
Message-ID: <20050816094220.GB17612@ebi.ac.uk>

On Tue, Aug 16, 2005 at 10:50:03AM +0200, Albert Vilella wrote:
> El dl 15 de 08 del 2005 a les 18:11 -0400, en/na Jason Stajich va
> escriure:
> > 
> > Albert - 
> > 
> > I think the new guessing changes for phylip are causing havoc.   Lots
> > of tests are failing t/GuessSeqFeature.t.  Can you take a look?
> 
> Uops, sorry about that.
> 
> I was trying to make the match for phylip more generic in $lineno=2. In
> my case it was returning an unexistent Bio::AlignIO::pir.
> 
> I have fixed it and now passes all the tests.
> 
> > I was looking over this module - it seems like we probably want to run
> > the tests in a particular order as some matches are ambiguous and we
> > probably need to have preferred order. At least we'll know when
> > something fails, what the order.
>
> As I understand from the DESCRIPTION, the more lines one checks, the
> better is determined, isn't it?
>
> Maybe it would help to add more line checks in some of the formats, that
> are loosely constricted in their first lines.

Yes, in some cases.  For a format to "win", its test needs to
be the "last one standing" after all the others have failed.
This naturally means that adding more formats will make the
guessing more uncertain, and the test rules need to be more and
more specific for them to be really useful.  On the other hand,
adding rules (or-parts to the if-statement) might make the test
push other tests out of the competition even though they might
be more deterministic of the actual format.

I was playing around with some "scoring" of the formats, so that
one could write a format test that would be allowed to sometimes
fail in one rule without disqualify that format as a possible
candidate.  This was too elaborate at the time and I settled for
a simple pass/fail system.

(Disclaimer :-) My aim in writing the module was to have a
*guessing* facility, not a routine that *determines* the format
of the input data.  I hope that this has been made clear.

> > Another thing is it uses open directly instead of allowing Root::IO to
> > open a filehandle.  If went to using Root::IO, it would allow peeking
> > at not only a file but a filehandle/stream and then use _pushback
> > after we have peeked over the first few lines, guess the format, then
> > pass it along to the SeqIO/AlignIO handle appropriately.

This is a good suggestion.  I will not have time to do this now
though, so if no-one else wants to supply this patch I'll look
at it at a later stage.


Regards,
Andreas


-- 
Andreas K?h?ri
EMBL-EBI/ensembl

---{ www.embl.org }---{ www.ebi.ac.uk }---{ www.ensembl.org }---
From andreas.kahari at ebi.ac.uk  Tue Aug 16 05:56:25 2005
From: andreas.kahari at ebi.ac.uk (Andreas Kahari)
Date: Tue Aug 16 05:46:46 2005
Subject: [Bioperl-l] Re: GuessSeqFormat problems
In-Reply-To: <20050816094220.GB17612@ebi.ac.uk>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<1124182203.8208.14.camel@localhost.localdomain>
	<20050816094220.GB17612@ebi.ac.uk>
Message-ID: <20050816095625.GC17612@ebi.ac.uk>

On Tue, Aug 16, 2005 at 10:42:20AM +0100, Andreas Kahari wrote:
[cut]
> Yes, in some cases.  For a format to "win", its test needs to
> be the "last one standing" after all the others have failed.

Looking at the code for the first time in some time, I realize
this is not how it is actually done, but almost.  If any one
line (one at the time, from the start and onwards) from the
input data matches only one format, then the guesser returns
that format as the format of the data.

Maybe it would be better if tests were ticked off the list as
they failed and never re-run?


Andreas

-- 
Andreas K?h?ri
EMBL-EBI/ensembl

---{ www.embl.org }---{ www.ebi.ac.uk }---{ www.ensembl.org }---
From avilella at gmail.com  Tue Aug 16 06:17:14 2005
From: avilella at gmail.com (Albert Vilella)
Date: Tue Aug 16 06:07:51 2005
Subject: [Bioperl-l] Re: GuessSeqFormat problems
In-Reply-To: <20050816095625.GC17612@ebi.ac.uk>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<1124182203.8208.14.camel@localhost.localdomain>
	<20050816094220.GB17612@ebi.ac.uk> <20050816095625.GC17612@ebi.ac.uk>
Message-ID: <1124187435.8208.32.camel@localhost.localdomain>

El dt 16 de 08 del 2005 a les 10:56 +0100, en/na Andreas Kahari va
escriure:
> On Tue, Aug 16, 2005 at 10:42:20AM +0100, Andreas Kahari wrote:
> [cut]
> > Yes, in some cases.  For a format to "win", its test needs to
> > be the "last one standing" after all the others have failed.
> 
> Looking at the code for the first time in some time, I realize
> this is not how it is actually done, but almost.  If any one
> line (one at the time, from the start and onwards) from the
> input data matches only one format, then the guesser returns
> that format as the format of the data.
> 
> Maybe it would be better if tests were ticked off the list as
> they failed and never re-run?

GuessSeqFormat would tick off the format if the next $lineno regex
fails:

- Look at line 1, tick off the formats that won't comply with the
($lineno == 1 && $line =~/regex/)
- Look at line 2, further eliminate the formats that won't comply with
the ($lineno == 2 && $line =~/regex/).
- and so on for line 3, 4 (and presumably not much more).

This should eliminate cases were a format passes the regex for line 2
although line 1 indicates it is not that format. I suppose this is what
was happening in my pir/phylip case.

    Albert.

> 
> 
> 
> Andreas
> 

From akarger at CGR.Harvard.edu  Tue Aug 16 08:17:19 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue Aug 16 08:03:48 2005
Subject: [Bioperl-l] new modules for sarching for patterns in fasta-fi les
Message-ID: <339D68B133EAD311971E009027DC47970321B6EA@montecarlo.cgr.harvard.edu>

> -----Original Message-----
> From: Heikki Lehvaslaiho [mailto:heikki@ebi.ac.uk] 
> 
> On Tuesday 09 August 2005 20:20, Amir Karger wrote:
> >
> > I wrote a simple one-liner to convert fasta to three, tab-separated
> > columns: ID (without '>') desc, and concatenated sequence. 
> 
> FYI, this is  already implemented as 'tab' format in Bio::SeqIO.
> 
>  -Heikki

I decided to write a separate translator for two reasons. First, I thought
people might want the desc in a separate column. (SeqIO::tab just takes the
entire desc line in one shot, right?) Second, I believe that some people who
use the Scriptome toolbox might not have Bioperl installed, and I don't want
to force them to have Bioperl just to parse some FASTAs. (OTOH, I was Lazy
enough to steal Bio::SeqIO to do most format conversions.)  

-Amir
From heikki at ebi.ac.uk  Tue Aug 16 08:47:30 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Tue Aug 16 08:48:51 2005
Subject: [Bioperl-l] Announce: Bio::Seq::Quality
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F5440@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA62F5440@ANTARESIA.be.devgen.com>
Message-ID: <200508161347.30875.heikki@ebi.ac.uk>

Marc,

See if the new version of Bio::Seq::Quality works the way you like.

On Thursday 14 July 2005 16:54, Marc Logghe wrote:
> Personally I'd use that optionally by setting/resetting a padding flag
> or something. I'd more be interested in having a way to validate your
> Bio::Seq::Quality one way or another. In de case padding is switched
> off, I'd like to know whether my sequence length is exactly the same as
> my quality array. Does that make sense ?

> In conclusion I'd opt for a inconsistency check and an optional padding
> feature.


I've finished restructuring Bio::Seq::MetaI classes so that they not any
more automatically pad with empty values or truncate meta values to
sequence length. This older behaviour can be activated by setting
force_flush() true.


These new methods have been added to Bio::Meta::MetaI:

force_flush()
meta_length()
named_meta_length()
is_flush()


Since Bio::Seq::Quality has two meta sets with explicit names
('quality', 'trace'), these new methods are in place, too:

quality_is_flush()
quality_length()
trace_is_flush()
trace_length()


Enjoy,

 -Heikki
From mayagao1999 at yahoo.com  Tue Aug 16 12:30:00 2005
From: mayagao1999 at yahoo.com (Alex Zhang)
Date: Tue Aug 16 12:22:09 2005
Subject: [Bioperl-l] A question about the perl code
Message-ID: <20050816163000.95901.qmail@web53501.mail.yahoo.com>

Dear all,

I made a group A which includes 16 combinations of any
two nucleotides like: AA,AC,AG,AT,
CA,CC,CG,CT,
GA,GC,GG,GT,
TA,TC,TG,TT          

If  I randomly got a pair like AC, I want to exclude
AC, AT, AG, AA, TC, CC, GC. In other words, I want to
exclude the pairs in group A which has the same
nucleotide with the pair randomly selected. Can
anybody suggest me how to approach this using Perl?

Thanks!
   Alex


____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
From johan.viklund at gmail.com  Tue Aug 16 13:09:07 2005
From: johan.viklund at gmail.com (Johan Viklund)
Date: Tue Aug 16 12:58:50 2005
Subject: [Bioperl-l] A question about the perl code
In-Reply-To: <20050816163000.95901.qmail@web53501.mail.yahoo.com>
References: <20050816163000.95901.qmail@web53501.mail.yahoo.com>
Message-ID: <5e924f0a0508161009786c819f@mail.gmail.com>

Hi,

If you have all the pairs in an array, say @nucleotide_pairs, and the
pair you randomly selected in the scalar $pair this will work:

@selected_pairs = grep { not /[$pair]/ } @nucleotide_pairs;

For a description on what grep does look in the perlfunc perldoc page
(on the web: <http://www.perldoc.com/perl5.8.4/pod/func/grep.html>

On 8/16/05, Alex Zhang <mayagao1999@yahoo.com> wrote:
> Dear all,
> 
> I made a group A which includes 16 combinations of any
> two nucleotides like: AA,AC,AG,AT,
> CA,CC,CG,CT,
> GA,GC,GG,GT,
> TA,TC,TG,TT
> 
> If  I randomly got a pair like AC, I want to exclude
> AC, AT, AG, AA, TC, CC, GC. In other words, I want to
> exclude the pairs in group A which has the same
> nucleotide with the pair randomly selected. Can
> anybody suggest me how to approach this using Perl?
> 
> Thanks!
>    Alex
> 
> 
> 
> ____________________________________________________
> Start your day with Yahoo! - make it your home page
> http://www.yahoo.com/r/hs
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Johan Viklund
E-post: <johan.viklund@gmail.com>
-----------------
perl -we '$,=" ";$_=bless sub{shift;print
split(/::/,ref)},Just::Another::Perl::Hacker;&$_'

From taerwin at tpg.com.au  Wed Aug 17 03:15:03 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Wed Aug 17 03:45:14 2005
Subject: [Bioperl-l] Another GuessSeqFormat question
In-Reply-To: <1124187435.8208.32.camel@localhost.localdomain>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<1124182203.8208.14.camel@localhost.localdomain>
	<20050816094220.GB17612@ebi.ac.uk> <20050816095625.GC17612@ebi.ac.uk>
	<1124187435.8208.32.camel@localhost.localdomain>
Message-ID: <1124262903.10144.71.camel@bacp4>

Hi,

Is there a way to determine which parser to use based on the guess from
Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested
in parsing and storing various files to a database.

I was wondering if it is a good idea to make a some extra functions so that files could be parsed automatically.

i.e for a fasta file

my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename );
my $format = $obj->guess;
my $parser = $obj->parser;              #RETURNS Bio::SeqIO
my $next_method = $obj->next_method;    #RETURNS next_seq
my $write_method = $obj->write_method;  #RETURNS write_seq

#PARSE FILE
my $infile = new $parser(-file => $filename, -format => $format);
while (my $result = $infile->$next_method) {

  #DO STUFF HERE
  #ADD $result TO DATABASE

}

Perhaps there is a better way to do this? Any suggestions would be great.

Regards,

Tim

From heikki at ebi.ac.uk  Wed Aug 17 05:03:02 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Wed Aug 17 05:20:14 2005
Subject: [Bioperl-l] Another GuessSeqFormat question
In-Reply-To: <1124262903.10144.71.camel@bacp4>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<1124187435.8208.32.camel@localhost.localdomain>
	<1124262903.10144.71.camel@bacp4>
Message-ID: <200508171003.02240.heikki@ebi.ac.uk>


Tim,

Bio::Tools::GuessSeqFormat is not meant to be used directly. It is called 
automatically by the constructor (new() method) of Bio::SeqIO:

 my $format = $param{'-format'} ||
     $class->_guess_format( $param{-file} || $ARGV[0] );

 if( ! $format ) { 
     if ($param{-file}) {
  $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}||
                    $ARGV[0] )->guess;
     } elsif ($param{-fh}) {
  $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}||
                    $ARGV[0] )->guess;
     }
 }
        # ... code removed
 return "Bio::SeqIO::$format"->new(@args);

The logic from the above code is as follows:

1. _guess_format() tries to determine the format of the file based on the 
filename extension.

2. Only if that fails try looking into the file/stream to guess the format 
using the Bio::Tools::GuessSeqFormat code.

3. The returned object is not a Bio::SeqIO but a Bio::SeqIO::$format object, 
which has the correct next_seq() and write_seq() methods. You can therefore 
use ref($seqoobject) to find out what parser is being used.


The standard code for doing this should contain all the automation needed:
  
foreach my $inputfilename (@all_files) {
    my $in  = Bio::SeqIO->new(-file => $inputfilename);
    while ( my $seq = $in->next_seq() ) {
     # do something
    }
}


Yours,
       -Heikki


On Wednesday 17 August 2005 08:15, Tim Erwin wrote:
> Hi,
>
> Is there a way to determine which parser to use based on the guess from
> Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested
> in parsing and storing various files to a database.
>
> I was wondering if it is a good idea to make a some extra functions so that
> files could be parsed automatically.
>
> i.e for a fasta file
>
> my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename );
> my $format = $obj->guess;
> my $parser = $obj->parser;              #RETURNS Bio::SeqIO
> my $next_method = $obj->next_method;    #RETURNS next_seq
> my $write_method = $obj->write_method;  #RETURNS write_seq
>
> #PARSE FILE
> my $infile = new $parser(-file => $filename, -format => $format);
> while (my $result = $infile->$next_method) {
>
>   #DO STUFF HERE
>   #ADD $result TO DATABASE
>
> }
>
> Perhaps there is a better way to do this? Any suggestions would be great.
>
> Regards,
>
> Tim
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From jason.stajich at duke.edu  Wed Aug 17 12:21:03 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug 17 12:13:55 2005
Subject: [Bioperl-l] thanks for the hardwork on HOWTO changeover
Message-ID: <386446B4-C2E6-4413-85E2-259B090EEC92@duke.edu>


I just want to publicly thank Brian Osborne for all the work to get  
the docbook and bioperl HOWTOs working more smoothly.

Brian has spent a lot of time recently figuring out to get the XML ->  
HTML and XML->PDF really working correctly.  The point of writing  
things in docbook instead of latex, POD, plain text, or HTML is  
docbook is (intended) to provide fairly easy transformation of the  
document text into a number of different formats (RTF, plain text,  
HTML, PDF).  ( Once you get the tools working of course).

The website should now have up-to-date versions of the documentation  
here: http://bioperl.org/HOWTOs and reflect the latest version of  
these documents that are in CVS.

In the future the website HOWTOs will be kept up to date more closely  
with the versions in the CVS repository instead of the last official  
release.

          Brian has taken care of a lot of behind the scenes things  
in terms of project documentation and deserves a lot of credit for  
moving us forward in trying to make the toolkit more accessible to  
different levels of programmers.  So I'm sending out a big thank you!

Please give these HOWTOs a try, print them out, frame them on your  
walls, etc.  If you spot inconsistencies or weaknesses please try and  
help out by suggesting changes or adding text.

We'd of course encourage other people to help write HOWTOs about  
particular aspects of Bioperl or uses of Bioperl.  You don't need to  
be an ubercoder to write one.  If the XML format scares you, ask  
questions and have a look at the existing documents in doc/howto/xml.


--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From brian_osborne at cognia.com  Wed Aug 17 13:00:02 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Aug 17 12:51:29 2005
Subject: [Bioperl-l] Re: thanks for the hardwork on HOWTO changeover
In-Reply-To: <386446B4-C2E6-4413-85E2-259B090EEC92@duke.edu>
Message-ID: <BF28E552.3805%brian_osborne@cognia.com>

Jason,

My pleasure. You also wrote:

>We'd of course encourage other people to help write HOWTOs about particular
aspects of Bioperl or uses of Bioperl.?
>You don't need to be an ubercoder to write one.?

That?s right. There?s certainly more to be done on these HOWTOs. For
example, you could imagine an Align HOWTO, a bioperl-db HOWTO, or a
Structure HOWTO (though the code may be lagging here). I could also see
HOWTOs on Ontology, Graph, Biblio, and so on. You could also imagine ones I
can?t imagine, I?m sure. There are also missing sections in the Beginners,
Feature-Annotation, and Graphics HOWTOs, to name a few.

I?d like to make a particular appeal to those of you who would like to
contribute but aren?t sure how. The great thing about writing these sorts of
things is that you end up knowing quite a bit about the subject matter, so
choosing a topic that interests you but that you don?t know well is a good
thing. You delve into the modules, you write and test code, you think of new
methods, it?s a great way to learn Bioperl .

Brian O.

On 8/17/05 12:21 PM, "Jason Stajich" <jason.stajich@duke.edu> wrote:

> 
> I just want to?publicly?thank Brian Osborne for all the work to get the
> docbook and bioperl HOWTOs working more smoothly.??
> 
> Brian has spent a lot of time recently figuring out to get the XML -> HTML and
> XML->PDF really working correctly.? The point of writing things in docbook
> instead of latex, POD, plain text, or HTML is docbook is (intended) to provide
> fairly easy transformation of the document text into a number of different
> formats (RTF, plain text, HTML, PDF).? (?Once you get the tools working of
> course).?
> 
> The website should now have up-to-date versions of the documentation here:
> http://bioperl.org/HOWTOs and reflect the latest version of these documents
> that are in CVS.??
> 
> In the future the website HOWTOs will be kept up to date more closely with the
> versions in the CVS repository instead of the last official release.
> 
> ? ?? ?? ?Brian has taken care of a lot of behind the scenes things in terms of
> project documentation and deserves a lot of credit for moving us forward in
> trying to make the toolkit more accessible to different levels of
> programmers.??So I'm sending out a big thank you!
> 
> Please give these HOWTOs a try, print them out, frame them on your walls,
> etc.? If you spot inconsistencies or weaknesses please try and help out by
> suggesting changes or adding text.
> 
> We'd of course encourage other people to help write HOWTOs about particular
> aspects of Bioperl or uses of Bioperl.? You don't need to be an ubercoder to
> write one.? If the XML format scares you, ask questions and have a look at the
> existing documents in doc/howto/xml.
> 
> 
>  
> 
> --
>  
> 
> Jason Stajich
>  
> 
> jason.stajich at duke.edu
>  
> 
> http://www.duke.edu/~jes12/
>  
>  
> 
> 


From akarger at CGR.Harvard.edu  Wed Aug 17 16:30:39 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed Aug 17 16:17:31 2005
Subject: [Bioperl-l] A question about the perl code
Message-ID: <339D68B133EAD311971E009027DC47970354B9BB@montecarlo.cgr.harvard.edu>

> -----Original Message-----
> From: Johan Viklund [mailto:johan.viklund@gmail.com] 
> On 8/16/05, Alex Zhang <mayagao1999@yahoo.com> wrote:
> > Dear all,
> > 
> > I made a group A which includes 16 combinations of any
> > two nucleotides like: AA,AC,AG,AT,
> > CA,CC,CG,CT,
> > GA,GC,GG,GT,
> > TA,TC,TG,TT
> > 
> > If  I randomly got a pair like AC, I want to exclude
> > AC, AT, AG, AA, TC, CC, GC. In other words, I want to
> > exclude the pairs in group A which has the same
> > nucleotide with the pair randomly selected. Can> 
>
> Hi,
> 
> If you have all the pairs in an array, say @nucleotide_pairs, and the
> pair you randomly selected in the scalar $pair this will work:
> 
> @selected_pairs = grep { not /[$pair]/ } @nucleotide_pairs;

I don't think that's true.  The above exclues anything with an A or C in
either position. (Btw, I used @pairs, not @nucleotide_pairs, for brevity.)

>perl -le 'foreach $i (qw(A C G T)) {foreach $j (qw (A C G T)) { push
@pairs, "$i$j"}} print join " ", @pairs'
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
>perl -le 'foreach $i (qw(A C G T)) {foreach $j (qw (A C G T)) { push
@pairs, "$i$j"}} $pair = "AC"; @selected_pairs = grep { not /[$pair]/ }
@pairs; print join " ", @selected_pairs'
GG GT TG TT

I believe the requirement is that it can't have an A in position 0 or a C in
position 1. One way to do it (not a particularly pretty way):

>perl -le 'foreach $i (qw(A C G T)) {foreach $j (qw (A C G T)) { push
@pairs, "$i$j"}} ($n1, $n2) = split //, "AC"; @selected_pairs = grep {
/[^$n1][^$n2]/ } @pairs; print join " ", @selected_pairs'
CA CG CT GA GG GT TA TG TT

The easiest way might really just be something like "grep {substr($_, 0, 1)
!= substr($pair, 0, 1) && substr($_, 1, 1) != substr($pair, 1, 1)}
@nucleotide_pairs

-Amir Karger
From taerwin at tpg.com.au  Wed Aug 17 19:18:33 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Wed Aug 17 19:15:18 2005
Subject: [Bioperl-l] Another GuessSeqFormat question
In-Reply-To: <200508171003.02240.heikki@ebi.ac.uk>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<1124187435.8208.32.camel@localhost.localdomain>
	<1124262903.10144.71.camel@bacp4> <200508171003.02240.heikki@ebi.ac.uk>
Message-ID: <1124320713.10144.80.camel@bacp4>

Thanks, Heikki, but I am trying to parse different IO objects such as
AlignIO, SeqIO and SearchIO, but what I am trying to do is guess the
format of any IO object and then use the appropriate parser.

i.e If I have a unknown file output.out I want to guess the format and
then the appropriate IO parser to use. Is there a way to do this or
should I just test all the IO parsers with an eval block.

Regards,

Tim


On Wed, 2005-08-17 at 10:03 +0100, Heikki Lehvaslaiho wrote:
> 
> Tim,
> 
> Bio::Tools::GuessSeqFormat is not meant to be used directly. It is called 
> automatically by the constructor (new() method) of Bio::SeqIO:
> 
>  my $format = $param{'-format'} ||
>      $class->_guess_format( $param{-file} || $ARGV[0] );
> 
>  if( ! $format ) { 
>      if ($param{-file}) {
>   $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}||
>                     $ARGV[0] )->guess;
>      } elsif ($param{-fh}) {
>   $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}||
>                     $ARGV[0] )->guess;
>      }
>  }
>         # ... code removed
>  return "Bio::SeqIO::$format"->new(@args);
> 
> The logic from the above code is as follows:
> 
> 1. _guess_format() tries to determine the format of the file based on the 
> filename extension.
> 
> 2. Only if that fails try looking into the file/stream to guess the format 
> using the Bio::Tools::GuessSeqFormat code.
> 
> 3. The returned object is not a Bio::SeqIO but a Bio::SeqIO::$format object, 
> which has the correct next_seq() and write_seq() methods. You can therefore 
> use ref($seqoobject) to find out what parser is being used.
> 
> 
> 
> The standard code for doing this should contain all the automation needed:
>   
> foreach my $inputfilename (@all_files) {
>     my $in  = Bio::SeqIO->new(-file => $inputfilename);
>     while ( my $seq = $in->next_seq() ) {
>      # do something
>     }
> }
> 
> 
> Yours,
>        -Heikki
> 
> 
> On Wednesday 17 August 2005 08:15, Tim Erwin wrote:
> > Hi,
> >
> > Is there a way to determine which parser to use based on the guess from
> > Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested
> > in parsing and storing various files to a database.
> >
> > I was wondering if it is a good idea to make a some extra functions so that
> > files could be parsed automatically.
> >
> > i.e for a fasta file
> >
> > my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename );
> > my $format = $obj->guess;
> > my $parser = $obj->parser;              #RETURNS Bio::SeqIO
> > my $next_method = $obj->next_method;    #RETURNS next_seq
> > my $write_method = $obj->write_method;  #RETURNS write_seq
> >
> > #PARSE FILE
> > my $infile = new $parser(-file => $filename, -format => $format);
> > while (my $result = $infile->$next_method) {
> >
> >   #DO STUFF HERE
> >   #ADD $result TO DATABASE
> >
> > }
> >
> > Perhaps there is a better way to do this? Any suggestions would be great.
> >
> > Regards,
> >
> > Tim
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From b_corbomite at hotmail.com  Wed Aug 17 22:58:21 2005
From: b_corbomite at hotmail.com (Bryan Yi)
Date: Wed Aug 17 22:48:15 2005
Subject: [Bioperl-l] Problems using Bio::Ext::Align and
	Bio::SeqIO::staden::read
Message-ID: <BAY19-F215725BF4DA3ADF2C189CE94B20@phx.gbl>

I was attempting to do pairwise alignments for 2 DNA sequences so I tried to 
use the Align module and when I ran into problems such as not having all the 
.h files I was able to solve the problems by reading the mailing list 
archives and even got the newest version for dpAlign.pm form CVS. However, 
I'm now getting this problem.

Had problems bootstrapping Inline module 'Bio::SeqIO::staden::read'

Can't load 
'/usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi/auto/Bio/SeqIO/staden/read/read.so' 
for module Bio::SeqIO::staden::read: 
/usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi/auto/Bio/SeqIO/staden/read/read.so: 
undefined symbol: deflateInit2_ at 
/usr/lib/perl5/5.8.6/i586-linux-thread-multi/DynaLoader.pm line 230, <DATA> 
line 1.
at /usr/lib/perl5/site_perl/5.8.6/Inline.pm line 500


at aligntest.pl line 0
INIT failed--call queue aborted, <DATA> line 1.

I'm sure that everything is in their place and I even installed the 
Bio::SeqIO::staden::read module personally, Can anybody help me with this 
problem? Also, is there another way to code a script that does  pairwise 
alignments without having to code everything from scratch?


From heikki at ebi.ac.uk  Thu Aug 18 05:54:37 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Thu Aug 18 05:44:58 2005
Subject: [Bioperl-l] Another GuessSeqFormat question
In-Reply-To: <1124320713.10144.80.camel@bacp4>
References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu>
	<200508171003.02240.heikki@ebi.ac.uk>
	<1124320713.10144.80.camel@bacp4>
Message-ID: <200508181054.38138.heikki@ebi.ac.uk>

Tim,

I thought there must be something in your problem I did not catch!

In principle it could be done, but practise it would be really difficult, 
these  text based formats just vary too much - the most recent GuessSeqFormat 
shows that well. I would suggest that you try do determine ways to separate 
AlignIO, SeqIO and SearchIO files from each other and then call the 
appropriate one. Once you got the heuristics  together you might want to 
think of putting the logic into a module.

Fasta files pose a  big problem hese. There is no general way to know if a 
fasta file is representing an alignment or not. For your specific case, you 
might find a heuristics that tells them apart, e.g. ratio of gap characters 
to residues, but that is highly unlikely to hold on someone else's data.

Good luck,

 -Heikki

On Thursday 18 August 2005 00:18, Tim Erwin wrote:
> Thanks, Heikki, but I am trying to parse different IO objects such as
> AlignIO, SeqIO and SearchIO, but what I am trying to do is guess the
> format of any IO object and then use the appropriate parser.
>
> i.e If I have a unknown file output.out I want to guess the format and
> then the appropriate IO parser to use. Is there a way to do this or
> should I just test all the IO parsers with an eval block.
>
> Regards,
>
> Tim
>
> On Wed, 2005-08-17 at 10:03 +0100, Heikki Lehvaslaiho wrote:
> > Tim,
> >
> > Bio::Tools::GuessSeqFormat is not meant to be used directly. It is called
> > automatically by the constructor (new() method) of Bio::SeqIO:
> >
> >  my $format = $param{'-format'} ||
> >      $class->_guess_format( $param{-file} || $ARGV[0] );
> >
> >  if( ! $format ) {
> >      if ($param{-file}) {
> >   $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}||
> >                     $ARGV[0] )->guess;
> >      } elsif ($param{-fh}) {
> >   $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}||
> >                     $ARGV[0] )->guess;
> >      }
> >  }
> >         # ... code removed
> >  return "Bio::SeqIO::$format"->new(@args);
> >
> > The logic from the above code is as follows:
> >
> > 1. _guess_format() tries to determine the format of the file based on the
> > filename extension.
> >
> > 2. Only if that fails try looking into the file/stream to guess the
> > format using the Bio::Tools::GuessSeqFormat code.
> >
> > 3. The returned object is not a Bio::SeqIO but a Bio::SeqIO::$format
> > object, which has the correct next_seq() and write_seq() methods. You can
> > therefore use ref($seqoobject) to find out what parser is being used.
> >
> >
> >
> > The standard code for doing this should contain all the automation
> > needed:
> >
> > foreach my $inputfilename (@all_files) {
> >     my $in  = Bio::SeqIO->new(-file => $inputfilename);
> >     while ( my $seq = $in->next_seq() ) {
> >      # do something
> >     }
> > }
> >
> >
> > Yours,
> >        -Heikki
> >
> > On Wednesday 17 August 2005 08:15, Tim Erwin wrote:
> > > Hi,
> > >
> > > Is there a way to determine which parser to use based on the guess from
> > > Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested
> > > in parsing and storing various files to a database.
> > >
> > > I was wondering if it is a good idea to make a some extra functions so
> > > that files could be parsed automatically.
> > >
> > > i.e for a fasta file
> > >
> > > my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename );
> > > my $format = $obj->guess;
> > > my $parser = $obj->parser;              #RETURNS Bio::SeqIO
> > > my $next_method = $obj->next_method;    #RETURNS next_seq
> > > my $write_method = $obj->write_method;  #RETURNS write_seq
> > >
> > > #PARSE FILE
> > > my $infile = new $parser(-file => $filename, -format => $format);
> > > while (my $result = $infile->$next_method) {
> > >
> > >   #DO STUFF HERE
> > >   #ADD $result TO DATABASE
> > >
> > > }
> > >
> > > Perhaps there is a better way to do this? Any suggestions would be
> > > great.
> > >
> > > Regards,
> > >
> > > Tim
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From michael.watson at bbsrc.ac.uk  Thu Aug 18 12:03:02 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu Aug 18 11:59:46 2005
Subject: [Bioperl-l] Trouble with Bio::Graphics
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95024C4D1F@iahce2knas1.iah.bbsrc.reserved>

Hi

This is going to sound like a rather hard bug to track down but maybe
someone can shed some light...

I have a rather complicated script that takes a sequence, aligns it with
other sequences, does some blast searching, then creates a whole load of
features of the result for drawing with Bio::Graphics.

I've used the script to create images of 2498 images.... But two fail...
Both with the same error message, and this is it in it's entirety (there
is no stack trace):

Can't locate object method "primary _tag" via package
"Bio::Location::Simple" at Bio/Graphics/processed_transcript.pm line 13,
<GEN12> line 56.

But of course it can't find that object method, and nor should it be...
"processed_transcript" is my glyph of choice and as I said it has worked
for 2498 of these jobs... But for some reason, on 2 of them, it's trying
to find method primary_tag not on a feature object but on a location
object.

I am bemused.

Any help appreciated.

Mick

From cain at cshl.edu  Thu Aug 18 13:44:39 2005
From: cain at cshl.edu (Scott Cain)
Date: Thu Aug 18 13:34:24 2005
Subject: [Bioperl-l] Trouble with Bio::Graphics
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95024C4D1F@iahce2knas1.iah.bbsrc.reserved>
References: <8975119BCD0AC5419D61A9CF1A923E95024C4D1F@iahce2knas1.iah.bbsrc.reserved>
Message-ID: <1124387080.3368.15.camel@localhost.localdomain>

Mick,

I don't really know, but I just wanted to clarify the error message that
you put below.  Is that a copy and paste, or did you retype it?  The
reason I ask is there are a few things that are odd about it that might
point to a problem:

- it mentions "primary _tag" with a space between the y and _.  Of
course there isn't a "primary _tag" method, as it is called
"primary_tag".

- It also references the file Bio/Graphics/processed_transcript.pm, but
that is not the right file path for that file; it should be in
Bio/Graphics/Glyph/processed_transcript.pm.

Now, since you said this script works most of the time, these can't be
fatal problems, but perhaps these are related to what the problem is.

Scott


On Thu, 2005-08-18 at 17:03 +0100, michael watson (IAH-C) wrote:
> Hi
> 
> This is going to sound like a rather hard bug to track down but maybe
> someone can shed some light...
> 
> I have a rather complicated script that takes a sequence, aligns it with
> other sequences, does some blast searching, then creates a whole load of
> features of the result for drawing with Bio::Graphics.
> 
> I've used the script to create images of 2498 images.... But two fail...
> Both with the same error message, and this is it in it's entirety (there
> is no stack trace):
> 
> Can't locate object method "primary _tag" via package
> "Bio::Location::Simple" at Bio/Graphics/processed_transcript.pm line 13,
> <GEN12> line 56.
> 
> But of course it can't find that object method, and nor should it be...
> "processed_transcript" is my glyph of choice and as I said it has worked
> for 2498 of these jobs... But for some reason, on 2 of them, it's trying
> to find method primary_tag not on a feature object but on a location
> object.
> 
> I am bemused.
> 
> Any help appreciated.
> 
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From michael.watson at bbsrc.ac.uk  Thu Aug 18 13:48:04 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu Aug 18 13:37:41 2005
Subject: [Bioperl-l] Trouble with Bio::Graphics
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9502067B1A@iahce2knas1.iah.bbsrc.reserved>

Actually, more than likely they are transcription errors - I was running it on linux, my mail is on windows, sorry!

Somewhere "->primary_tag" is being called on a Location object...


-----Original Message-----
From:	Scott Cain [mailto:cain@cshl.edu]
Sent:	Thu 18/08/2005 6:44 PM
To:	michael watson (IAH-C)
Cc:	bioperl-l
Subject:	Re: [Bioperl-l] Trouble with Bio::Graphics

Mick,

I don't really know, but I just wanted to clarify the error message that
you put below.  Is that a copy and paste, or did you retype it?  The
reason I ask is there are a few things that are odd about it that might
point to a problem:

- it mentions "primary _tag" with a space between the y and _.  Of
course there isn't a "primary _tag" method, as it is called
"primary_tag".

- It also references the file Bio/Graphics/processed_transcript.pm, but
that is not the right file path for that file; it should be in
Bio/Graphics/Glyph/processed_transcript.pm.

Now, since you said this script works most of the time, these can't be
fatal problems, but perhaps these are related to what the problem is.

Scott


On Thu, 2005-08-18 at 17:03 +0100, michael watson (IAH-C) wrote:
> Hi
> 
> This is going to sound like a rather hard bug to track down but maybe
> someone can shed some light...
> 
> I have a rather complicated script that takes a sequence, aligns it with
> other sequences, does some blast searching, then creates a whole load of
> features of the result for drawing with Bio::Graphics.
> 
> I've used the script to create images of 2498 images.... But two fail...
> Both with the same error message, and this is it in it's entirety (there
> is no stack trace):
> 
> Can't locate object method "primary _tag" via package
> "Bio::Location::Simple" at Bio/Graphics/processed_transcript.pm line 13,
> <GEN12> line 56.
> 
> But of course it can't find that object method, and nor should it be...
> "processed_transcript" is my glyph of choice and as I said it has worked
> for 2498 of these jobs... But for some reason, on 2 of them, it's trying
> to find method primary_tag not on a feature object but on a location
> object.
> 
> I am bemused.
> 
> Any help appreciated.
> 
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From tembe at bioanalysis.org  Thu Aug 18 16:57:32 2005
From: tembe at bioanalysis.org (Waibhav Tembe)
Date: Thu Aug 18 16:47:20 2005
Subject: [Bioperl-l] Multiline Query Name in Blast Output
Message-ID: <4304F63C.3040108@bioanalysis.org>

Hello List,

Is there any way to read multi-line query name from BLAST output using 
SearchIO?

E.g., for the first few lines of BLAST output (attached at the end) the 
code :

 while($result = $in->next_result ) {
     print $result->query_name, "\n";
}

prints only :
Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4

How to print the entire name that is in multiple lines? i.e.,
Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200:End_41274_Section1To75_Start:12_Length:36

Thanks.

------------------------

BLASTN 2.2.10 [Oct-19-2004]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
1200:End_41274_Section1To75_Start:12_Length:36
         (36 letters)

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           2,718,617 sequences; 12,254,801,043 total letters

Searching..................................................done

                                                                 Score    E
Sequences producing significant alignments:                      (bits) 
Value

gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of 415...    
59   1e-07
gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis str...    
59   1e-07
gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92 complete...    
59   1e-07


From jason.stajich at duke.edu  Thu Aug 18 18:00:40 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Aug 18 17:50:37 2005
Subject: [Bioperl-l] Multiline Query Name in Blast Output
In-Reply-To: <4304F63C.3040108@bioanalysis.org>
References: <4304F63C.3040108@bioanalysis.org>
Message-ID: <0F83D319-1999-4541-89A0-200C894E70B5@duke.edu>

does the rest show up in $r->query_description ?

-jason
On Aug 18, 2005, at 4:57 PM, Waibhav Tembe wrote:

> Hello List,
>
> Is there any way to read multi-line query name from BLAST output  
> using SearchIO?
>
> E.g., for the first few lines of BLAST output (attached at the end)  
> the code :
>
> while($result = $in->next_result ) {
>     print $result->query_name, "\n";
> }
>
> prints only :
> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
>
> How to print the entire name that is in multiple lines? i.e.,
> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200:En 
> d_41274_Section1To75_Start:12_Length:36
>
> Thanks.
>
> ------------------------
>
> BLASTN 2.2.10 [Oct-19-2004]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database  
> search
> programs",  Nucleic Acids Res. 25:3389-3402.
>
> Query= Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
> 1200:End_41274_Section1To75_Start:12_Length:36
>         (36 letters)
>
> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
>           2,718,617 sequences; 12,254,801,043 total letters
>
> Searching..................................................done
>
>                                                                  
> Score    E
> Sequences producing significant alignments:                       
> (bits) Value
>
> gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of  
> 415...    59   1e-07
> gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis  
> str...    59   1e-07
> gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92  
> complete...    59   1e-07
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From jason.stajich at duke.edu  Sun Aug 21 22:04:28 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Aug 21 21:53:48 2005
Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb
Message-ID: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>

Lincoln -

I'm getting these warning when using the memory or berkeleydb adaptors:

Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ 
Adaptor/memory/feature_serializer.pm line 17.

This the line
return join $;,@a;

$; is not defined in my scripts, if I defining it warnings vanish,  
but did you mean ';' or do you want to provide a particular  
customizeable separator?

Also I notice in the SYNOPSIS of berkeleydb you have this
# do queries
   my $segment  = $db->segment(Chromosome => '1R');
   my $subseg   = $segment->subseq(5000..6000);
   my @features = $subseg->features('gene');


Is there a reason to introduce a new subseq API since we already have  
$seq->subseq($start,$end) for Bio::PrimarySeqI?  I didn't check to  
see, but I assume this is a convention you are using throughout  
Bio::DB::GFF? Hopefully start,end will work and I assume your usual  
start => end works too.

Thanks,
-jason
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sun Aug 21 23:20:01 2005
From: chen_li3 at yahoo.com (chen li)
Date: Mon Aug 22 08:17:10 2005
Subject: [Bioperl-l] write sequence into file after stream query to database
Message-ID: <20050822032001.20433.qmail@web30814.mail.mud.yahoo.com>

Dear all,

I am new to Bioperl. I wonder if anyone could help me
out. After I do string query about nucleotides how
should I write all the sequences into a file?

Thanks,

Li 


____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
From jason.stajich at duke.edu  Mon Aug 22 08:31:45 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug 22 08:21:26 2005
Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb
In-Reply-To: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
Message-ID: <866DB003-A797-4E88-968D-69C9CF511127@duke.edu>


On Aug 21, 2005, at 10:04 PM, Jason Stajich wrote:

> Lincoln -
>
> I'm getting these warning when using the memory or berkeleydb  
> adaptors:
>
> Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ 
> Adaptor/memory/feature_serializer.pm line 17.
>
> This the line
> return join $;,@a;
>
> $; is not defined in my scripts, if I defining it warnings vanish,  
> but did you mean ';' or do you want to provide a particular  
> customizeable separator?
>
Or perhaps the undef warnings are due to empty fields in @a, since  
features seem to be extracted out from the db later w/o incident.

BTW - I think the 'memory' and 'berkeleydb' implementations  
effectively replace Bio::SeqFeature::Collection which also used a BDB  
Btree to store features/locations and make range queries, but not the  
full Bio::DB::GFF & DAS APIs.

> Also I notice in the SYNOPSIS of berkeleydb you have this
> # do queries
>   my $segment  = $db->segment(Chromosome => '1R');
>   my $subseg   = $segment->subseq(5000..6000);
>   my @features = $subseg->features('gene');
>
>
> Is there a reason to introduce a new subseq API since we already  
> have $seq->subseq($start,$end) for Bio::PrimarySeqI?  I didn't  
> check to see, but I assume this is a convention you are using  
> throughout Bio::DB::GFF? Hopefully start,end will work and I assume  
> your usual start => end works too.
>
> Thanks,
> -jason
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From lstein at cshl.edu  Mon Aug 22 10:59:42 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 22 10:50:00 2005
Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb
In-Reply-To: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
Message-ID: <200508221059.44303.lstein@cshl.edu>

On Sunday 21 August 2005 10:04 pm, Jason Stajich wrote:
> Lincoln -
>
> I'm getting these warning when using the memory or berkeleydb adaptors:
>
> Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/
> Adaptor/memory/feature_serializer.pm line 17.
>
> This the line
> return join $;,@a;

$; is a legacy Perl global variable that was used to separate elements of 
multidimensional arrays in the perl 4 days. It contains an infrequently-used 
control character, and since nobody is likely to change it I adopted it for 
quick serialization (much faster than freeze/thaw I found in my benchmarks). 
Your warnings are probably coming from undefined values in the @a array, and 
I think the best thing to do is to localize $^W around this area. I'll do 
that.

> $; is not defined in my scripts, if I defining it warnings vanish,
> but did you mean ';' or do you want to provide a particular
> customizeable separator?
>
> Also I notice in the SYNOPSIS of berkeleydb you have this
> # do queries
>    my $segment  = $db->segment(Chromosome => '1R');
>    my $subseg   = $segment->subseq(5000..6000);
>    my @features = $subseg->features('gene');
>
>
> Is there a reason to introduce a new subseq API since we already have
> $seq->subseq($start,$end) for Bio::PrimarySeqI?  I didn't check to
> see, but I assume this is a convention you are using throughout
> Bio::DB::GFF? Hopefully start,end will work and I assume your usual
> start => end works too.

This is badness on my part. I'll fix that. My old habits from AcePerl keep 
sneaking in.

Lincoln

>
> Thanks,
> -jason
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From lstein at cshl.edu  Mon Aug 22 10:59:42 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 22 10:50:49 2005
Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb
In-Reply-To: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
Message-ID: <200508221059.44303.lstein@cshl.edu>

On Sunday 21 August 2005 10:04 pm, Jason Stajich wrote:
> Lincoln -
>
> I'm getting these warning when using the memory or berkeleydb adaptors:
>
> Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/
> Adaptor/memory/feature_serializer.pm line 17.
>
> This the line
> return join $;,@a;

$; is a legacy Perl global variable that was used to separate elements of 
multidimensional arrays in the perl 4 days. It contains an infrequently-used 
control character, and since nobody is likely to change it I adopted it for 
quick serialization (much faster than freeze/thaw I found in my benchmarks). 
Your warnings are probably coming from undefined values in the @a array, and 
I think the best thing to do is to localize $^W around this area. I'll do 
that.

> $; is not defined in my scripts, if I defining it warnings vanish,
> but did you mean ';' or do you want to provide a particular
> customizeable separator?
>
> Also I notice in the SYNOPSIS of berkeleydb you have this
> # do queries
>    my $segment  = $db->segment(Chromosome => '1R');
>    my $subseg   = $segment->subseq(5000..6000);
>    my @features = $subseg->features('gene');
>
>
> Is there a reason to introduce a new subseq API since we already have
> $seq->subseq($start,$end) for Bio::PrimarySeqI?  I didn't check to
> see, but I assume this is a convention you are using throughout
> Bio::DB::GFF? Hopefully start,end will work and I assume your usual
> start => end works too.

This is badness on my part. I'll fix that. My old habits from AcePerl keep 
sneaking in.

Lincoln

>
> Thanks,
> -jason
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From lstein at cshl.edu  Mon Aug 22 11:01:03 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 22 10:51:37 2005
Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb
In-Reply-To: <866DB003-A797-4E88-968D-69C9CF511127@duke.edu>
References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
	<866DB003-A797-4E88-968D-69C9CF511127@duke.edu>
Message-ID: <200508221101.04634.lstein@cshl.edu>

On Monday 22 August 2005 08:31 am, Jason Stajich wrote:
> On Aug 21, 2005, at 10:04 PM, Jason Stajich wrote:
> > Lincoln -
> >
> > I'm getting these warning when using the memory or berkeleydb
> > adaptors:
> >
> > Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/
> > Adaptor/memory/feature_serializer.pm line 17.
> >
> > This the line
> > return join $;,@a;
> >
> > $; is not defined in my scripts, if I defining it warnings vanish,
> > but did you mean ';' or do you want to provide a particular
> > customizeable separator?
>
> Or perhaps the undef warnings are due to empty fields in @a, since
> features seem to be extracted out from the db later w/o incident.
>
> BTW - I think the 'memory' and 'berkeleydb' implementations
> effectively replace Bio::SeqFeature::Collection which also used a BDB
> Btree to store features/locations and make range queries, but not the
> full Bio::DB::GFF & DAS APIs.

Ouch! I apologize if I stepped on your (and anyone else's) foot!

Lincoln

>
> > Also I notice in the SYNOPSIS of berkeleydb you have this
> > # do queries
> >   my $segment  = $db->segment(Chromosome => '1R');
> >   my $subseg   = $segment->subseq(5000..6000);
> >   my @features = $subseg->features('gene');
> >
> >
> > Is there a reason to introduce a new subseq API since we already
> > have $seq->subseq($start,$end) for Bio::PrimarySeqI?  I didn't
> > check to see, but I assume this is a convention you are using
> > throughout Bio::DB::GFF? Hopefully start,end will work and I assume
> > your usual start => end works too.
> >
> > Thanks,
> > -jason
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From lstein at cshl.edu  Mon Aug 22 11:04:37 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 22 10:54:12 2005
Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb
In-Reply-To: <866DB003-A797-4E88-968D-69C9CF511127@duke.edu>
References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu>
	<866DB003-A797-4E88-968D-69C9CF511127@duke.edu>
Message-ID: <200508221104.38633.lstein@cshl.edu>


> >   my $subseg   = $segment->subseq(5000..6000);

Actually this is just a typo. The .. should be a comma. Fixing.

Lincoln

> >   my @features = $subseg->features('gene');
> >
> >
> > Is there a reason to introduce a new subseq API since we already
> > have $seq->subseq($start,$end) for Bio::PrimarySeqI?  I didn't
> > check to see, but I assume this is a convention you are using
> > throughout Bio::DB::GFF? Hopefully start,end will work and I assume
> > your usual start => end works too.
> >
> > Thanks,
> > -jason
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From tembe at bioanalysis.org  Mon Aug 22 11:50:39 2005
From: tembe at bioanalysis.org (Waibhav Tembe)
Date: Mon Aug 22 11:40:16 2005
Subject: [Bioperl-l] Multiline Query Name in Blast Output
In-Reply-To: <0F83D319-1999-4541-89A0-200C894E70B5@duke.edu>
References: <4304F63C.3040108@bioanalysis.org>
	<0F83D319-1999-4541-89A0-200C894E70B5@duke.edu>
Message-ID: <4309F44F.9050701@bioanalysis.org>

Thanks Jason. The rest showed up in $r->description. Is there any reason 
for this?

Does BioPerl assume that first whit-space character separates the query 
name and description?


Jason Stajich wrote:

> does the rest show up in $r->query_description ?
>
> -jason
> On Aug 18, 2005, at 4:57 PM, Waibhav Tembe wrote:
>
>> Hello List,
>>
>> Is there any way to read multi-line query name from BLAST output  
>> using SearchIO?
>>
>> E.g., for the first few lines of BLAST output (attached at the end)  
>> the code :
>>
>> while($result = $in->next_result ) {
>>     print $result->query_name, "\n";
>> }
>>
>> prints only :
>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
>>
>> How to print the entire name that is in multiple lines? i.e.,
>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200:En 
>> d_41274_Section1To75_Start:12_Length:36
>>
>> Thanks.
>>
>> ------------------------
>>
>> BLASTN 2.2.10 [Oct-19-2004]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer,
>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>> "Gapped BLAST and PSI-BLAST: a new generation of protein database  
>> search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> Query= Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
>> 1200:End_41274_Section1To75_Start:12_Length:36
>>         (36 letters)
>>
>> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
>> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
>>           2,718,617 sequences; 12,254,801,043 total letters
>>
>> Searching..................................................done
>>
>>                                                                  
>> Score    E
>> Sequences producing significant alignments:                       
>> (bits) Value
>>
>> gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of  
>> 415...    59   1e-07
>> gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis  
>> str...    59   1e-07
>> gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92  
>> complete...    59   1e-07
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>

From jason.stajich at duke.edu  Mon Aug 22 11:57:20 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug 22 11:46:59 2005
Subject: [Bioperl-l] Multiline Query Name in Blast Output
In-Reply-To: <4309F44F.9050701@bioanalysis.org>
References: <4304F63C.3040108@bioanalysis.org>
	<0F83D319-1999-4541-89A0-200C894E70B5@duke.edu>
	<4309F44F.9050701@bioanalysis.org>
Message-ID: <4C522C85-E177-43BE-B38B-76ABA4A5D5F0@duke.edu>

exactly.  The line-wrap (\n) is considered whitespace as well so  
hence the separation.

On Aug 22, 2005, at 11:50 AM, Waibhav Tembe wrote:

> Thanks Jason. The rest showed up in $r->description. Is there any  
> reason for this?
>
> Does BioPerl assume that first whit-space character separates the  
> query name and description?
>
>
>
> Jason Stajich wrote:
>
>
>> does the rest show up in $r->query_description ?
>>
>> -jason
>> On Aug 18, 2005, at 4:57 PM, Waibhav Tembe wrote:
>>
>>
>>> Hello List,
>>>
>>> Is there any way to read multi-line query name from BLAST output   
>>> using SearchIO?
>>>
>>> E.g., for the first few lines of BLAST output (attached at the  
>>> end)  the code :
>>>
>>> while($result = $in->next_result ) {
>>>     print $result->query_name, "\n";
>>> }
>>>
>>> prints only :
>>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
>>>
>>> How to print the entire name that is in multiple lines? i.e.,
>>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200: 
>>> En d_41274_Section1To75_Start:12_Length:36
>>>
>>> Thanks.
>>>
>>> ------------------------
>>>
>>> BLASTN 2.2.10 [Oct-19-2004]
>>>
>>>
>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.   
>>> Schaffer,
>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>> "Gapped BLAST and PSI-BLAST: a new generation of protein  
>>> database  search
>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>
>>> Query=  
>>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4
>>> 1200:End_41274_Section1To75_Start:12_Length:36
>>>         (36 letters)
>>>
>>> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
>>> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
>>>           2,718,617 sequences; 12,254,801,043 total letters
>>>
>>> Searching..................................................done
>>>
>>>                                                                   
>>> Score    E
>>> Sequences producing significant alignments:                        
>>> (bits) Value
>>>
>>> gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of   
>>> 415...    59   1e-07
>>> gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis   
>>> str...    59   1e-07
>>> gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92   
>>> complete...    59   1e-07
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>>
>
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gnf.org  Mon Aug 22 14:18:30 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Aug 22 14:09:37 2005
Subject: [Bioperl-l] Re: [BioSQL-l] loading fasta records with
	load_seqdatabase.pl - correct fasta headers
In-Reply-To: <3cfaa40405082207574597e9f9@mail.gmail.com>
References: <3cfaa40405082207574597e9f9@mail.gmail.com>
Message-ID: <bf6e3b7e0e872811f56f97261e4ccdeb@gnf.org>

Amit,

this is a problem inherent with the fasta format as there is no precise  
definition of what to put as identifier and/or accession. The Bioperl  
fasta parser doesn't set the accession and so it defaults to "unknown"  
(it cannot be undef). Since fasta format also doesn't have the version  
in a defined place, the version will be undef (i.e., zero for biosql)  
for every entry, so that all your sequences will have the same unique  
key of (accession,version,namespace) which violates the constraint  
after the first sequence was stored.

The easiest way to deal with this is to write your own  
SequenceProcessor (see Bio::Factory::SequenceProcessorI and  
Bio::Seq::BaseSeqProcessor) and then pipeline it using the --pipeline  
argument to load_seqdatabase.pl.

Simple examples for how to write your own SeqProcessor have been posted  
before, e.g., by Marc Logghe:

http://portal.open-bio.org/pipermail/bioperl-l/2005-February/018158.html

and by myself

http://portal.open-bio.org/pipermail/bioperl-l/2003-June/012369.html

	-hilmar

On Aug 22, 2005, at 7:57 AM, Amit Indap wrote:

> Hi,
>
> I am new to using the biosql. I am trying to load fasta formatted
> RefSeq records into the biosql schema. When I try to use the
> load_seqdatabase.pl script I get the following error
>
> load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql
> --namespace refseq --format fasta refseq.fa
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
> were  
> ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknown 
> ","PREDICTED:
> Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,<NULL>)
> Duplicate entry 'unknown-1-0' for key 2
> ---------------------------------------------------
> Could not store unknown:
> ------------- EXCEPTION  -------------
> MSG: You're trying to lie about the length: is 1316 but you say 6474
> STACK Bio::PrimarySeq::length
> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418
> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: 
> 553
> STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612
> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: 
> 553
> STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:1310
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:976
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:855
> STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284
> STACK Bio::DB::BioSQL::SeqAdaptor::attach_children
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:1341
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:976
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:855
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:205
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:254
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: 
> 272
> STACK (eval) ./load_seqdatabase.pl:542
> STACK toplevel ./load_seqdatabase.pl:525
>
> --------------------------------------
>  at ./load_seqdatabase.pl line 555
>
> I think my fasta headers are incorrect since it says it cannot store
> unknown. The first fasta record in my refseq.fa is this:
>
>> gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin
> domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E),
> mRNA
>
> Do I need to reformat that header? I downloaded the NM series of
> Refseqs in fasta form from NCBI's ftp site and wanted to load them
> into the biosql schema.
>
> Thanks,
>
> Amit Indap
> Dept. of Biological Statistics and Computational Biology
> Cornell University
>
>
> (error message)
> Loading refseq.fa ...
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From kynn at panix.com  Mon Aug 22 15:30:10 2005
From: kynn at panix.com (kynn@panix.com)
Date: Mon Aug 22 15:20:18 2005
Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs
Message-ID: <200508221930.j7MJUAd02770@panix3.panix.com>


Hi!  Does anyone know of a high-level guide to the various Human
genomics databases?  Basically, I want to make my software as flexible
as possible regarding the kinds of human gene identifiers it will
recognize, but I'm having a hard time figuring out the various naming
schemes.  I'm (vaguely) aware of

  Entrez
  IPI
  Ensembl
  SwissProt/UniProt
  RefSeq
  LocusLink
  "gene symbols" (e.g. GHC1_HUMAN)
  "gi numbers"

though I'm not sure these are strictly comparable.  (Am I missing any
major ones?)

>From what I've seen at sites like http://harvester.embl.de it looks
like a huge mess, but maybe this is only a reflection of my ignorance.
Any pointers would be much appreciated!

kj

From skirov at utk.edu  Mon Aug 22 16:28:19 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Mon Aug 22 16:19:19 2005
Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs
In-Reply-To: <200508221930.j7MJUAd02770@panix3.panix.com>
References: <200508221930.j7MJUAd02770@panix3.panix.com>
Message-ID: <430A3563.6010601@utk.edu>

You certainly have missed HGNC (http://www.gene.ucl.ac.uk/nomenclature/) 
and HUGO in particular. By the way EntrezGene substituted LocusLink few 
months ago... Affy mapping is also useful for some things. Aceview 
(http://www.ncbi.nih.gov/IEB/Research/Acembly/index.html?human), MIM 
(Mendelian Inheritance in Man, though it is not just human anymore...) 
and GRIF (Gene Reference Into Function) are also of some relevance I 
guess, though you can access these through EntrezGene. There are bunch 
of others more specialized DBs out there as well, depends what you want 
to do.
By the way you can search for most relationships you have mentioned 
through EnsMART and BioMART or through GeneKeyDB 
(genereg.ornl.gov/gkdb),  the last being developed by me so this is not 
unbiased :-) ...
Stefan

kynn@panix.com wrote:

>Hi!  Does anyone know of a high-level guide to the various Human
>genomics databases?  Basically, I want to make my software as flexible
>as possible regarding the kinds of human gene identifiers it will
>recognize, but I'm having a hard time figuring out the various naming
>schemes.  I'm (vaguely) aware of
>
>  Entrez
>  IPI
>  Ensembl
>  SwissProt/UniProt
>  RefSeq
>  LocusLink
>  "gene symbols" (e.g. GHC1_HUMAN)
>  "gi numbers"
>
>though I'm not sure these are strictly comparable.  (Am I missing any
>major ones?)
>
>>From what I've seen at sites like http://harvester.embl.de it looks
>like a huge mess, but maybe this is only a reflection of my ignorance.
>Any pointers would be much appreciated!
>
>kj
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

From horvathm at niehs.nih.gov  Mon Aug 22 17:32:53 2005
From: horvathm at niehs.nih.gov (Horvath, Monica (NIH/NIEHS))
Date: Mon Aug 22 17:23:07 2005
Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs
Message-ID: <8ECE8333845072439B4CBE35D1814ACC04F19DA3@nihexchange22.nih.gov>

If I were you, I would find yourself a copy of the database issue of NAR--
usually in January of each year.

Also, I would check out all of the gene symbol/mappings provided as options
(e.g. ensmart) or tables within the ensembl and ucsc genome browser
systems-- this would give you an idea of the type of mappings typically
desired by biologists.

I would make your application as flexible as possible to accept additional
code portions to accommodate new identification schemes because this stuff
changes constantly.

Monica M. Horvath, Ph.D.
Laboratory of Molecular Genetics
Environmental Genomics Group
111 T.W. Alexander Drive
P.O. Box 12233 MD C3-03
Research Triange Park, NC 27709-2333
+1 919-541-3266

-----Original Message-----
From: kynn@panix.com [mailto:kynn@panix.com] 
Sent: Monday, August 22, 2005 3:30 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs


Hi!  Does anyone know of a high-level guide to the various Human
genomics databases?  Basically, I want to make my software as flexible
as possible regarding the kinds of human gene identifiers it will
recognize, but I'm having a hard time figuring out the various naming
schemes.  I'm (vaguely) aware of

  Entrez
  IPI
  Ensembl
  SwissProt/UniProt
  RefSeq
  LocusLink
  "gene symbols" (e.g. GHC1_HUMAN)
  "gi numbers"

though I'm not sure these are strictly comparable.  (Am I missing any
major ones?)

>From what I've seen at sites like http://harvester.embl.de it looks
like a huge mess, but maybe this is only a reflection of my ignorance.
Any pointers would be much appreciated!

kj

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l
From lstein at cshl.edu  Mon Aug 22 18:18:07 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 22 18:07:45 2005
Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta?
In-Reply-To: <1124126549.2868.2.camel@localhost.localdomain>
References: <1124116511.2891.9.camel@localhost.localdomain>
	<1124126549.2868.2.camel@localhost.localdomain>
Message-ID: <200508221818.08032.lstein@cshl.edu>

I've just looked into this. The bug occurs when Windows opens the FASTA file 
in text mode rather than binary mode; when in text mode the "\r\n" sequence 
is invisibly mapped to "\n" during readline operations, so Bio::DB::Fasta 
thinks that it is dealing with a Unix-format file; then when the module tries 
to seek() to the proper line number, Windows doesn't do the line end mapping, 
so it seeks to the wrong offset.  (sound of hairs being pulled)

I've fixed the problem by explicitly calling binmode() on all filehandles that 
Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS and 
the gbrowse 1.63 CVS version. It ought to fix Chris' GC content weirdness.

Lincoln

On Monday 15 August 2005 01:22 pm, Scott Cain wrote:
> Just to follow up on my own email with a little more information: in
> Fasta.pm, line 697:
>
>   $termination_length ||= /\r\n$/ ? 2 : 1;  # account for crlf-terminated
> Windows files
>
> The pattern match is failing on DOS formatted files; I don't know why.
> Does anyone else?
>
> On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote:
> > Hello all,
> >
> > I am investigating a bug in GBrowse that seems to only surface when
> > people are using the memory (ie, file) adaptor on Windows systems.
> > Here's the bug report:
> >
> > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&grou
> >p_id=27707
> >
> > I've tracked the problem down to Bio::DB::Fasta when the file is dos
> > formatted (that is, it has both line feeds and carriage returns), BDF
> > returns the wrong string when a subsequence is requested, but when the
> > file is unix formatted (ie only CR (or is it only LF?)), it returns the
> > right string.  I wrote the very simple test script below and stepped it
> > through the perl debugger.  It looks like the bug is in the caloffset
> > method, as it returns the same offsets regardless of the file type,
> > which then makes the subsequent seek into the file go to the wrong
> > coordinates of dos formatted files.
> >
> > Unfortunately, I don't really know what is going on caloffset, so I
> > don't know how to fix it, but it presumably has to check the format of
> > the file somewhere and take that into account.
> >
> > Thanks,
> > Scott

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From ureddi at emich.edu  Mon Aug 22 21:32:32 2005
From: ureddi at emich.edu (Usha Rani Reddi)
Date: Mon Aug 22 21:22:02 2005
Subject: [Bioperl-l] bl2seq
Message-ID: <ed9d68ed3ff9.ed3ff9ed9d68@emich.edu>

Hi,
I am trying to compare two hundred thousand probes(each one of them) to
another genome. Format of the file containing probes is like this
SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
What I am trying to do is compare PROBE_SEQUENCE to fasta file of
Streptococcus agalactiae. I am trying to loop through the probes but not
sure how to proceed. My program is working fine for single sequence. One
more thing is I am not interested in matches, I want to display only
mismatches. I am new to Bioperl, some one please help me with this.
Thanks
Usha
From james.wasmuth at ed.ac.uk  Tue Aug 23 04:13:56 2005
From: james.wasmuth at ed.ac.uk (James Wasmuth)
Date: Tue Aug 23 04:22:26 2005
Subject: [Bioperl-l] bl2seq
In-Reply-To: <ed9d68ed3ff9.ed3ff9ed9d68@emich.edu>
References: <ed9d68ed3ff9.ed3ff9ed9d68@emich.edu>
Message-ID: <430ADAC4.6090601@ed.ac.uk>

Hi Usha,

How new are you to Perl?
I would turn these probe sequences into a fasta file using Bio::SeqIO.
Use this as the input file for a normal blast search.
Then search the blast output using Bio::SearchIO.

The best way to learn is with the HOWTOs:
http://bioperl.org/HOWTOs/html/SeqIO.html
http://bioperl.org/HOWTOs/html/SearchIO.html

any problems? Post back to the list.

hope this helps

james


Usha Rani Reddi wrote:

>Hi,
>I am trying to compare two hundred thousand probes(each one of them) to
>another genome. Format of the file containing probes is like this
>SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
>NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
>NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
>NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
>NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
>What I am trying to do is compare PROBE_SEQUENCE to fasta file of
>Streptococcus agalactiae. I am trying to loop through the probes but not
>sure how to proceed. My program is working fine for single sequence. One
>more thing is I am not interested in matches, I want to display only
>mismatches. I am new to Bioperl, some one please help me with this.
>Thanks
>Usha
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
"You have made your way from worm to man,
    and much in you is still worm."
	Friedrich Nietzsche, Thus Spoke Zarathustra


Blaxter Nematode Genomics Group   |
Institute of Evolutionary Biology |
Ashworth Laboratories, KB         | tel: +44 131 650 7403
University of Edinburgh           | web: www.nematodes.org/~james
Edinburgh                         |
EH9 3JT                           |
UK                                |	
 

From palmeida at igc.gulbenkian.pt  Tue Aug 23 04:41:09 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Aug 23 04:33:27 2005
Subject: [Bioperl-l] bl2seq
In-Reply-To: <ed9d68ed3ff9.ed3ff9ed9d68@emich.edu>
References: <ed9d68ed3ff9.ed3ff9ed9d68@emich.edu>
Message-ID: <200508230941.09888.palmeida@igc.gulbenkian.pt>

Hi Usha,

Perhaps something like this:

my @seqs;
open IN, "yourfile";
while (<IN>) {
chomp();  #get rid of newline character
my @line = split(/\s+/);
push @seqs, $line[3] if $line[3] =~ /^A-Z+$/;
}

foreach my $seq (@seqs) {
#Do whatever you are doing successfully for a single sequence
}

I'm not sure about the syntax. because I haven't been using Perl, but that's 
the general idea.

-Paulo

On Tuesday 23 August 2005 02:32, Usha Rani Reddi wrote:
> Hi,
> I am trying to compare two hundred thousand probes(each one of them) to
> another genome. Format of the file containing probes is like this
> SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE
> NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC
> NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT
> NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG
> NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC
> NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA
> NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG
> NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG
> NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA
> What I am trying to do is compare PROBE_SEQUENCE to fasta file of
> Streptococcus agalactiae. I am trying to loop through the probes but not
> sure how to proceed. My program is working fine for single sequence. One
> more thing is I am not interested in matches, I want to display only
> mismatches. I am new to Bioperl, some one please help me with this.
> Thanks
> Usha
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Paulo Almeida
Tel: +351 21 4464635, Fax: +351 21 4407970
Instituto Gulbenkian de Ci?ncia
Rua da Quinta Grande, 6
P-2780-156 Oeiras
Portugal
http://www.igc.gulbenkian.pt

From mark.schreiber at novartis.com  Tue Aug 23 04:53:21 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Aug 23 04:42:59 2005
Subject: [Bioperl-l] Re: [BioSQL-l] loading fasta records with
 load_seqdatabase.pl -	correct fasta headers
Message-ID: <OF81BF5834.066FC52B-ON48257066.00308512-48257066.0030D49E@EU.novartis.net>

The NCBI 'standard' is to format the header like this:

>gi|{identifier}|{namespace}|{accession}.{version}|{accession} description

eg

>gi|123456|gb|AE657483.3|AE657483.3 Gene of interest from Flying Spaghetti 
Monster.

Biojava is going to be adopting this approach when the appropriate 
information is available.

- Mark

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


Hilmar Lapp <hlapp@gnf.org>
Sent by: biosql-l-bounces@portal.open-bio.org
08/23/2005 02:18 AM

 
        To:     Amit Indap <indapa@gmail.com>
        cc:     Bioperl <bioperl-l@bioperl.org>, Biosql <biosql-l@open-bio.org>, (bcc: 
Mark Schreiber/GP/Novartis)
        Subject:        Re: [BioSQL-l] loading fasta records with load_seqdatabase.pl - correct 
fasta headers


Amit,

this is a problem inherent with the fasta format as there is no precise 
definition of what to put as identifier and/or accession. The Bioperl 
fasta parser doesn't set the accession and so it defaults to "unknown" 
(it cannot be undef). Since fasta format also doesn't have the version 
in a defined place, the version will be undef (i.e., zero for biosql) 
for every entry, so that all your sequences will have the same unique 
key of (accession,version,namespace) which violates the constraint 
after the first sequence was stored.

The easiest way to deal with this is to write your own 
SequenceProcessor (see Bio::Factory::SequenceProcessorI and 
Bio::Seq::BaseSeqProcessor) and then pipeline it using the --pipeline 
argument to load_seqdatabase.pl.

Simple examples for how to write your own SeqProcessor have been posted 
before, e.g., by Marc Logghe:

http://portal.open-bio.org/pipermail/bioperl-l/2005-February/018158.html

and by myself

http://portal.open-bio.org/pipermail/bioperl-l/2003-June/012369.html

                 -hilmar

On Aug 22, 2005, at 7:57 AM, Amit Indap wrote:

> Hi,
>
> I am new to using the biosql. I am trying to load fasta formatted
> RefSeq records into the biosql schema. When I try to use the
> load_seqdatabase.pl script I get the following error
>
> load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql
> --namespace refseq --format fasta refseq.fa
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
> were 
> ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknown 
> ","PREDICTED:
> Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,<NULL>)
> Duplicate entry 'unknown-1-0' for key 2
> ---------------------------------------------------
> Could not store unknown:
> ------------- EXCEPTION  -------------
> MSG: You're trying to lie about the length: is 1316 but you say 6474
> STACK Bio::PrimarySeq::length
> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418
> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: 
> 553
> STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612
> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: 
> 553
> STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:1310
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:976
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:855
> STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284
> STACK Bio::DB::BioSQL::SeqAdaptor::attach_children
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:1341
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:976
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:855
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:205
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:254
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: 
> 272
> STACK (eval) ./load_seqdatabase.pl:542
> STACK toplevel ./load_seqdatabase.pl:525
>
> --------------------------------------
>  at ./load_seqdatabase.pl line 555
>
> I think my fasta headers are incorrect since it says it cannot store
> unknown. The first fasta record in my refseq.fa is this:
>
>> gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin
> domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E),
> mRNA
>
> Do I need to reformat that header? I downloaded the NM series of
> Refseqs in fasta form from NCBI's ftp site and wanted to load them
> into the biosql schema.
>
> Thanks,
>
> Amit Indap
> Dept. of Biological Statistics and Computational Biology
> Cornell University
>
>
> (error message)
> Loading refseq.fa ...
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From ureddi at emich.edu  Tue Aug 23 07:50:36 2005
From: ureddi at emich.edu (Usha Rani Reddi)
Date: Tue Aug 23 07:40:05 2005
Subject: [Bioperl-l] Local bl2seq
Message-ID: <f0f29af0bcdc.f0bcdcf0f29a@emich.edu>

Hi,
I am trying to use BLAST to compare the sequences. I did the program in
Bioperl. Below is my piece of code
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
                             -format => "fasta" );
$seq_obj = $seqio_obj->next_seq;
$input2 = Bio::Seq->new(-id=>"testquery2",
                                 
-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
          
$factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn','outfile' => 'bl2seq1.out');
$blast_report = $factory->bl2seq($seq_obj, $input2);

I need help for looping input2. I want to extract this part of sequence
from a file containing 200000 records. Using perl I am extracting the
sequence part for file of format given below.
SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG

code for extracting PROBE_SEQUENCE looks like this

$NemSeq =<STDIN>;

chomp $NemSeq;

unless (open(seqfile, $NemSeq)) {
print "Cannot open file \n";
exit;
}
@NemSeq = <seqfile> ;

close seqfile;

for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
    #print $k, $NemSeq[$k];
    @Nem =split(/\t/,$NemSeq[$k]);
    $input= $Nem[3];

    #print scalar(@Nem);
    #print $Nem[3], "\n";
    
}


@Nem =split(/\t/,$NemSeq)

$input2 = substr(@NemSeq,4,32);

So far I could successfully use bioperl(bl2seq) to compare whole genome
with single probe. 
I want to compare all the 200000 thousand probes. I am interested only
in mismatches, for this particular scenario my assumption is that more
than 90% of them will match. I want to send only the mismatches to
output file and discard the matches. I would like to classify the
mismatches based on the percentage dissimilarity, is there a way in
Bioperl for this? Thanks a lot for the reply. Please help me with this.
Thanks
Usha


----- Original Message -----
From: Barry Moore <barry.moore@genetics.utah.edu>
Date: Monday, August 22, 2005 11:45 pm
Subject: Re: [Bioperl-l] bl2seq

> Usha,
> 
> The best advice I can give you is that you need to focus your 
> question a 
> bit more.  What method are you using to compare your probe to your 
> fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your sequence 
> is 
> working fine for single sequence.  Are you using Bioperl for that?  
> Can 
> you tell us exactly what isn't working for you or what questions 
> you 
> have about working with multiple sequences?  Are you already using 
> Bioperl with your single sequence comparison? Can you show us some 
> code?
> Barry
> 
> Usha Rani Reddi wrote:
> 
> >Hi,
> >I am trying to compare two hundred thousand probes(each one of 
> them) to
> >another genome. Format of the file containing probes is like this
> >SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
> >NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
> >NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
> >NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
> >NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
> >NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
> >NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
> >NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
> >NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
> >What I am trying to do is compare PROBE_SEQUENCE to fasta file of
> >Streptococcus agalactiae. I am trying to loop through the probes 
> but not
> >sure how to proceed. My program is working fine for single 
> sequence. One
> >more thing is I am not interested in matches, I want to display only
> >mismatches. I am new to Bioperl, some one please help me with this.
> >Thanks
> >Usha
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >  
> >
> 
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
> 
> 
> 
> 
From bmoore at genetics.utah.edu  Tue Aug 23 14:40:51 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Tue Aug 23 14:28:10 2005
Subject: [Bioperl-l] RE: Local bl2seq
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863F44@m.hg.genetics.utah.edu>

Usha-

I think the code below will wrap your existing code in the loop you
need.  You will want to get a copy of a good perl programming book like
Programming Perl from O'Reilly.  It will help you out with all those
little perl details like loop structures etc.

Barry

#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;

my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta",
                                -format => "fasta" );

my $seq_obj = $seqio_obj->next_seq;

open (IN, " location/of/your/probe/file") or die "Can't open IN";

while (my $row = <IN>) {
    chomp $row;
    #Assuming your file is tab delimited...
    my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/,
$row;

    my $input2 = Bio::Seq->new(-id=>"testquery2",
                               -seq=> $probe_sequence
                            );

    my $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn',
                                                        'outfile' =>
'bl2seq1.out');

    my $blast_report = $factory->bl2seq($seq_obj, $input2);

    #Here is where you want to throw out good matches.  You'll need to
determine
    #what method you want to do that.  Maybe since you want there to be
no good
    #hits you would just call $blast_report->max_significance and make
sure it's
    #value is too high to be significant.
    if ($blast_report->max_significance > 0.01) {
        print "$row\n";
    }
}

-----Original Message-----
From: Usha Rani Reddi [mailto:ureddi@emich.edu] 
Sent: Tuesday, August 23, 2005 5:51 AM
To: Barry Moore
Cc: bioperl-l@portal.open-bio.org
Subject: Local bl2seq

Hi,
I am trying to use BLAST to compare the sequences. I did the program in
Bioperl. Below is my piece of code
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
                             -format => "fasta" );
$seq_obj = $seqio_obj->next_seq;
$input2 = Bio::Seq->new(-id=>"testquery2",
                                 
-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
          
$factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn','outfile' => 'bl2seq1.out');
$blast_report = $factory->bl2seq($seq_obj, $input2);

I need help for looping input2. I want to extract this part of sequence
from a file containing 200000 records. Using perl I am extracting the
sequence part for file of format given below.
SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG

code for extracting PROBE_SEQUENCE looks like this

$NemSeq =<STDIN>;

chomp $NemSeq;

unless (open(seqfile, $NemSeq)) {
print "Cannot open file \n";
exit;
}
@NemSeq = <seqfile> ;

close seqfile;

for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
    #print $k, $NemSeq[$k];
    @Nem =split(/\t/,$NemSeq[$k]);
    $input= $Nem[3];

    #print scalar(@Nem);
    #print $Nem[3], "\n";
    
}


@Nem =split(/\t/,$NemSeq)

$input2 = substr(@NemSeq,4,32);

So far I could successfully use bioperl(bl2seq) to compare whole genome
with single probe. 
I want to compare all the 200000 thousand probes. I am interested only
in mismatches, for this particular scenario my assumption is that more
than 90% of them will match. I want to send only the mismatches to
output file and discard the matches. I would like to classify the
mismatches based on the percentage dissimilarity, is there a way in
Bioperl for this? Thanks a lot for the reply. Please help me with this.
Thanks
Usha


----- Original Message -----
From: Barry Moore <barry.moore@genetics.utah.edu>
Date: Monday, August 22, 2005 11:45 pm
Subject: Re: [Bioperl-l] bl2seq

> Usha,
> 
> The best advice I can give you is that you need to focus your 
> question a 
> bit more.  What method are you using to compare your probe to your 
> fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your sequence 
> is 
> working fine for single sequence.  Are you using Bioperl for that?  
> Can 
> you tell us exactly what isn't working for you or what questions 
> you 
> have about working with multiple sequences?  Are you already using 
> Bioperl with your single sequence comparison? Can you show us some 
> code?
> Barry
> 
> Usha Rani Reddi wrote:
> 
> >Hi,
> >I am trying to compare two hundred thousand probes(each one of 
> them) to
> >another genome. Format of the file containing probes is like this
> >SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
> >NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
> >NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
> >NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
> >NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
> >NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
> >NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
> >NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
> >NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
> >What I am trying to do is compare PROBE_SEQUENCE to fasta file of
> >Streptococcus agalactiae. I am trying to loop through the probes 
> but not
> >sure how to proceed. My program is working fine for single 
> sequence. One
> >more thing is I am not interested in matches, I want to display only
> >mismatches. I am new to Bioperl, some one please help me with this.
> >Thanks
> >Usha
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >  
> >
> 
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
> 
> 
> 
> 

From agathman at semo.edu  Tue Aug 23 14:38:42 2005
From: agathman at semo.edu (Gathman, Allen)
Date: Tue Aug 23 14:29:16 2005
Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq
Message-ID: <33580922CBEEC846B473BAE124985DE00514DB69@xchgnt.semo.edu>

Hi, BioPerl gurus: 

 
Although this question involves a Gbrowse database, I think it's actually a
BioPerl question at heart - and in any case it appears that there's a lot of
overlap between the people who answer questions in both lists, so I'm
guessing this is a good place for this question.  

 
I've written a script that finds particular pfam hits in a GBROWSE database,
then uses "overlapping_features" to find predicted gene features of  type
"transcript:GLEAN" that overlap those pfams. I've set the aggregator
"transcript" as {CDS/mRNA} already.  I select features using a regular
expression to choose particular names, then I use spliced_seq to return the
spliced CDS of each feature - but I'm only getting back the CDS that
actually overlap the pfam hit, not the full predicted gene.  So my question
is, what do I need to do in order to get ALL the CDS of each predicted gene
feature spliced together, instead of only the ones that actually overlap the
pfam hit I used to select that predicted gene?  

 
Thanks in advance for any help you can give...

 
Here's the code:

 
#!/usr/bin/perl

 
use strict;

use Bio::DB::GFF;

use Bio::Seq;

use Bio::SeqIO;

use Getopt::Long;

 
my $outfile; 

GetOptions(

           'o|outfile=s' => \$outfile,

           ); 

 
my $outfa= Bio::SeqIO -> new (-file => ">$outfile",

                              -format => 'Fasta'

                             );

 
my $db      = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',

                               -dsn        =>
'dbi:mysql:database=cc;host=localhost',

                               -fasta      => '/gbrowse/databases/cc'

                               );

 
$db->add_aggregator('transcript{CDS/mRNA}');

 
     for (my $i =1; $i<=20; $i++){

      my $pfamname="Peptidase_C$i";

        my @pfams = $db->get_feature_by_name( Domain => $pfamname);

        foreach my $pfamhit (@pfams){

             my $desc = $pfamname;

             my $score=$pfamhit->score;

             my $name = $pfamhit->name;

             $desc.= " $score ";

             $desc.= $pfamhit->location->seq_id();

             $desc.= ": ";

#

# Here's where I'm selecting predicted genes that overlap the Pfam hit

#

             my @genes = $db -> overlapping_features(

                                   -refseq => $pfamhit->location->seq_id,

                                   -start => $pfamhit->start,

                                   -stop => $pfamhit->stop,

                                   -types =>'transcript:GLEAN' 

                                   );

#

# Now I'm choosing the ones with names I want out of the selected genes

#

                  foreach my $gene (@genes){

                       my $gid=$gene->display_id();

                       if ($gid =~/aug_GLEAN/){

                            $desc.=$gene->start;

                            $desc.=" - "; 

                            $desc.=$gene->stop;

#

# Here I'm splicing the gene, tacking on a description, and outputting it.

#

 
                            my $splseq = $gene->spliced_seq();

                            $splseq->desc($desc);

                            $splseq->display_id($gid);

                            $outfa->write_seq($splseq);

 
                       }# end if aug_GLEAN   

                  }# end foreach gene

         }# end foreach pfamhit

   } # end for numbers

close OUT;

 
Allen Gathman

http://cstl-csm.semo.edu/gathman

 
From hlapp at gnf.org  Tue Aug 23 15:43:56 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Aug 23 15:32:35 2005
Subject: [Bioperl-l] Re: [BioSQL-l] loading fasta records with
	load_seqdatabase.pl -	correct fasta headers
In-Reply-To: <OF81BF5834.066FC52B-ON48257066.00308512-48257066.0030D49E@EU.novartis.net>
References: <OF81BF5834.066FC52B-ON48257066.00308512-48257066.0030D49E@EU.novartis.net>
Message-ID: <965b72e4c118ef3b937c7d3464b2ecd1@gnf.org>

I guess it may be worth to deposit a suitable SeqProcessor for this  
type of ID in the repository as probably many people may find it  
useful.

On Aug 23, 2005, at 1:53 AM, mark.schreiber@novartis.com wrote:

> The NCBI 'standard' is to format the header like this:
>
>> gi|{identifier}|{namespace}|{accession}.{version}|{accession}  
>> description
>
> eg
>
>> gi|123456|gb|AE657483.3|AE657483.3 Gene of interest from Flying  
>> Spaghetti
> Monster.
>
> Biojava is going to be adopting this approach when the appropriate
> information is available.
>
> - Mark
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
>
>
>
>
> Hilmar Lapp <hlapp@gnf.org>
> Sent by: biosql-l-bounces@portal.open-bio.org
> 08/23/2005 02:18 AM
>
>
>         To:     Amit Indap <indapa@gmail.com>
>         cc:     Bioperl <bioperl-l@bioperl.org>, Biosql  
> <biosql-l@open-bio.org>, (bcc:
> Mark Schreiber/GP/Novartis)
>         Subject:        Re: [BioSQL-l] loading fasta records with  
> load_seqdatabase.pl - correct
> fasta headers
>
>
> Amit,
>
> this is a problem inherent with the fasta format as there is no precise
> definition of what to put as identifier and/or accession. The Bioperl
> fasta parser doesn't set the accession and so it defaults to "unknown"
> (it cannot be undef). Since fasta format also doesn't have the version
> in a defined place, the version will be undef (i.e., zero for biosql)
> for every entry, so that all your sequences will have the same unique
> key of (accession,version,namespace) which violates the constraint
> after the first sequence was stored.
>
> The easiest way to deal with this is to write your own
> SequenceProcessor (see Bio::Factory::SequenceProcessorI and
> Bio::Seq::BaseSeqProcessor) and then pipeline it using the --pipeline
> argument to load_seqdatabase.pl.
>
> Simple examples for how to write your own SeqProcessor have been posted
> before, e.g., by Marc Logghe:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2005-February/ 
> 018158.html
>
> and by myself
>
> http://portal.open-bio.org/pipermail/bioperl-l/2003-June/012369.html
>
>                  -hilmar
>
> On Aug 22, 2005, at 7:57 AM, Amit Indap wrote:
>
>> Hi,
>>
>> I am new to using the biosql. I am trying to load fasta formatted
>> RefSeq records into the biosql schema. When I try to use the
>> load_seqdatabase.pl script I get the following error
>>
>> load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql
>> --namespace refseq --format fasta refseq.fa
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
>> were
>> ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknow 
>> n
>> ","PREDICTED:
>> Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,<NULL>)
>> Duplicate entry 'unknown-1-0' for key 2
>> ---------------------------------------------------
>> Could not store unknown:
>> ------------- EXCEPTION  -------------
>> MSG: You're trying to lie about the length: is 1316 but you say 6474
>> STACK Bio::PrimarySeq::length
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418
>> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:
>> 553
>> STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612
>> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:
>> 553
>> STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:1310
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:976
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:855
>> STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284
>> STACK Bio::DB::BioSQL::SeqAdaptor::attach_children
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:1341
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:976
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:855
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:205
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/
>> BasePersistenceAdaptor.pm:254
>> STACK Bio::DB::Persistent::PersistentObject::store
>> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:
>> 272
>> STACK (eval) ./load_seqdatabase.pl:542
>> STACK toplevel ./load_seqdatabase.pl:525
>>
>> --------------------------------------
>>  at ./load_seqdatabase.pl line 555
>>
>> I think my fasta headers are incorrect since it says it cannot store
>> unknown. The first fasta record in my refseq.fa is this:
>>
>>> gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin
>> domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E),
>> mRNA
>>
>> Do I need to reformat that header? I downloaded the NM series of
>> Refseqs in fasta form from NCBI's ftp site and wanted to load them
>> into the biosql schema.
>>
>> Thanks,
>>
>> Amit Indap
>> Dept. of Biological Statistics and Computational Biology
>> Cornell University
>>
>>
>> (error message)
>> Loading refseq.fa ...
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From cjfields at uiuc.edu  Tue Aug 23 16:03:56 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue Aug 23 15:53:23 2005
Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta?
In-Reply-To: <200508221818.08032.lstein@cshl.edu>
References: <1124116511.2891.9.camel@localhost.localdomain>
	<1124126549.2868.2.camel@localhost.localdomain>
	<200508221818.08032.lstein@cshl.edu>
Message-ID: <6.2.1.2.2.20050823150328.04bba998@express.cites.uiuc.edu>

That did the trick!  Everything looks fine now.  Thanks Lincoln!

Chris

At 05:18 PM 8/22/2005, Lincoln Stein wrote:
>I've just looked into this. The bug occurs when Windows opens the FASTA file
>in text mode rather than binary mode; when in text mode the "\r\n" sequence
>is invisibly mapped to "\n" during readline operations, so Bio::DB::Fasta
>thinks that it is dealing with a Unix-format file; then when the module tries
>to seek() to the proper line number, Windows doesn't do the line end mapping,
>so it seeks to the wrong offset.  (sound of hairs being pulled)
>
>I've fixed the problem by explicitly calling binmode() on all filehandles 
>that
>Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS and
>the gbrowse 1.63 CVS version. It ought to fix Chris' GC content weirdness.
>
>Lincoln
>
>On Monday 15 August 2005 01:22 pm, Scott Cain wrote:
> > Just to follow up on my own email with a little more information: in
> > Fasta.pm, line 697:
> >
> >   $termination_length ||= /\r\n$/ ? 2 : 1;  # account for crlf-terminated
> > Windows files
> >
> > The pattern match is failing on DOS formatted files; I don't know why.
> > Does anyone else?
> >
> > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote:
> > > Hello all,
> > >
> > > I am investigating a bug in GBrowse that seems to only surface when
> > > people are using the memory (ie, file) adaptor on Windows systems.
> > > Here's the bug report:
> > >
> > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&grou
> > >p_id=27707
> > >
> > > I've tracked the problem down to Bio::DB::Fasta when the file is dos
> > > formatted (that is, it has both line feeds and carriage returns), BDF
> > > returns the wrong string when a subsequence is requested, but when the
> > > file is unix formatted (ie only CR (or is it only LF?)), it returns the
> > > right string.  I wrote the very simple test script below and stepped it
> > > through the perl debugger.  It looks like the bug is in the caloffset
> > > method, as it returns the same offsets regardless of the file type,
> > > which then makes the subsequent seek into the file go to the wrong
> > > coordinates of dos formatted files.
> > >
> > > Unfortunately, I don't really know what is going on caloffset, so I
> > > don't know how to fix it, but it presumably has to check the format of
> > > the file somewhere and take that into account.
> > >
> > > Thanks,
> > > Scott
>
>--
>Lincoln D. Stein
>Cold Spring Harbor Laboratory
>1 Bungtown Road
>Cold Spring Harbor, NY 11724
>FOR URGENT MESSAGES & SCHEDULING,
>PLEASE CONTACT MY ASSISTANT,
>SANDRA MICHELSEN, AT michelse@cshl.edu

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign

From thechans at citiz.net  Wed Aug 24 05:48:19 2005
From: thechans at citiz.net (thechans@citiz.net)
Date: Wed Aug 24 06:06:50 2005
Subject: [Bioperl-l] Strange result
Message-ID: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S>

Hello,
I am new to Bioperl. I want to just copy some sequences out of an existing genbank file(multiple seq included).
The selected sequences is still gb format and put to a new file.
I received no warning and it worked well but the new file showed something strange.
e.g,
					/country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)"
                     /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)"
                     /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)"
                     /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)"
                     /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)"
why it happened?


From barry.moore at genetics.utah.edu  Wed Aug 24 08:30:00 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed Aug 24 08:21:48 2005
Subject: [Bioperl-l] Re: Thanks
In-Reply-To: <1072d891078847.10788471072d89@emich.edu>
References: <1072d891078847.10788471072d89@emich.edu>
Message-ID: <430C6848.4010905@genetics.utah.edu>

Usha-

It is important that you keep Bioperl related discussions on the bioperl 
list, that way others can benefit from the discussion in the future by 
searching the archives.  Having said that, I am constantly accidentally 
replying directly to people and not to the list because I hit reply 
instead of reply all, so I'm not really a good one to talk.

It seems from nature of your questions to this list that you might be 
quite new to perl programming.  You participation on the list is still 
welcome, but this is the kind of problem that you want to learn how to 
solve yourself.  You want to to avoid at all cost giving the impression 
that you are asking the list to do your debugging for you  - otherwise 
people will just stop replying to your messages.  An very valuable 
article about asking questions to forums like this one is found at 
http://www.catb.org/~esr/faqs/smart-questions.html#answers 
<http://www.catb.org/%7Eesr/faqs/smart-questions.html#answers>.  Read 
it.  Live it.  O.K.  enough preaching...

If you have Programming Perl 
<http://www.amazon.com/exec/obidos/tg/detail/-/0596000278/qid=1124883861/sr=8-1/ref=pd_bbs_1/102-6339742-6061723?v=glance&s=books&n=507846> 
read the first half of chapter 20.  If you don't have the Perl book, you 
can get the same info from 
http://www.perldoc.com/perl5.8.4/pod/perldebug.html.  After you 
understand the perl debugger (or if you already do) look at the error 
message that you got.  In the first line it reports "seq doesn't 
validate".  So something is wrong with some sequence that you are trying 
to use in your script.  Since perl itself isn't' aware of sequences, and 
since the stack trace below shows that the exception occurred while in 
Bio::PrimarySeq you can conclude that some sequence that your are 
sending to bioperl is bad.  Now fire up perl running your script with 
the debugger like this perl -d yourscript.pl.  Use 'n' to step through 
your code to the open command on line 14.  No errors yet?  You've just 
loaded your sequences.fasta sequence, so that sequence must be OK.  
Continue stepping through your code until you get the error again.  
Where exactly did the error occur?  When you try to set $input2 as a new 
Bio::Seq object?  Run the debugger again, and step through to the line 
just before where the error occurred.  Use the debugger's x command to 
see what the values of  $seq_id, $probe_id, $position, $probe_sequence 
are.  This should give you a clue s to what your problem is.  One more 
clue comes from the error message.  It says "Attempting to set the 
sequence to [PROBE_SEQUENCE] which does not look healthy".  The error 
message says that you are trying to set the sequence to PROBE_SEQUENCE.  
Try to figure out why this error is occurring and how to solve it.  If 
you're still stuck let us know what you've tried and ask us again. 

Barry


Usha Rani Reddi wrote:

>Hi,
>Thanks a lot for your help. When I tried to run the given code I got the
>following message.
>
>MSG: seq doesn't validate, mismatch is 1
>---------------------------------------------------
>
>------------- EXCEPTION  -------------
>MSG: Attempting to set the sequence to [PROBE_SEQUENCE] which does not
>look healthy
>STACK Bio::PrimarySeq::seq
>/usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:268
>STACK Bio::PrimarySeq::new
>/usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:217
>STACK Bio::Seq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:498
>STACK toplevel barr:23
>
>What should I do next? Please help me.
>Thanks
>Usha.
>
>----- Original Message -----
>From: Barry Moore <bmoore@genetics.utah.edu>
>Date: Tuesday, August 23, 2005 2:40 pm
>Subject: [Bioperl-l] RE: Local bl2seq
>
>  
>
>>Usha-
>>
>>I think the code below will wrap your existing code in the loop you
>>need.  You will want to get a copy of a good perl programming book 
>>likeProgramming Perl from O'Reilly.  It will help you out with all 
>>thoselittle perl details like loop structures etc.
>>
>>Barry
>>
>>#!/usr/bin/perl
>>
>>use strict;
>>use warnings;
>>use Bio::SeqIO;
>>use Bio::Tools::Run::StandAloneBlast;
>>use Bio::Seq;
>>
>>my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta",
>>                               -format => "fasta" );
>>
>>my $seq_obj = $seqio_obj->next_seq;
>>
>>open (IN, " location/of/your/probe/file") or die "Can't open IN";
>>
>>while (my $row = <IN>) {
>>   chomp $row;
>>   #Assuming your file is tab delimited...
>>   my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/,
>>$row;
>>
>>   my $input2 = Bio::Seq->new(-id=>"testquery2",
>>                              -seq=> $probe_sequence
>>                           );
>>
>>   my $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>'blastn',
>>                                                       'outfile' =>
>>'bl2seq1.out');
>>
>>   my $blast_report = $factory->bl2seq($seq_obj, $input2);
>>
>>   #Here is where you want to throw out good matches.  You'll need to
>>determine
>>   #what method you want to do that.  Maybe since you want there 
>>to be
>>no good
>>   #hits you would just call $blast_report->max_significance and make
>>sure it's
>>   #value is too high to be significant.
>>   if ($blast_report->max_significance > 0.01) {
>>       print "$row\n";
>>   }
>>}
>>
>>-----Original Message-----
>>From: Usha Rani Reddi [mailto:ureddi@emich.edu] 
>>Sent: Tuesday, August 23, 2005 5:51 AM
>>To: Barry Moore
>>Cc: bioperl-l@portal.open-bio.org
>>Subject: Local bl2seq
>>
>>Hi,
>>I am trying to use BLAST to compare the sequences. I did the 
>>program in
>>Bioperl. Below is my piece of code
>>use Bio::SeqIO;
>>use Bio::Tools::Run::StandAloneBlast;
>>use Bio::Seq;
>>$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
>>                            -format => "fasta" );
>>$seq_obj = $seqio_obj->next_seq;
>>$input2 = Bio::Seq->new(-id=>"testquery2",
>>                                
>>-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
>>         
>>$factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>'blastn','outfile' => 'bl2seq1.out');
>>$blast_report = $factory->bl2seq($seq_obj, $input2);
>>
>>I need help for looping input2. I want to extract this part of 
>>sequencefrom a file containing 200000 records. Using perl I am 
>>extracting the
>>sequence part for file of format given below.
>>SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>>NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>>NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>
>>code for extracting PROBE_SEQUENCE looks like this
>>
>>$NemSeq =<STDIN>;
>>
>>chomp $NemSeq;
>>
>>unless (open(seqfile, $NemSeq)) {
>>print "Cannot open file \n";
>>exit;
>>}
>>@NemSeq = <seqfile> ;
>>
>>close seqfile;
>>
>>for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
>>   #print $k, $NemSeq[$k];
>>   @Nem =split(/\t/,$NemSeq[$k]);
>>   $input= $Nem[3];
>>
>>   #print scalar(@Nem);
>>   #print $Nem[3], "\n";
>>   
>>}
>>
>>
>>@Nem =split(/\t/,$NemSeq)
>>
>>$input2 = substr(@NemSeq,4,32);
>>
>>So far I could successfully use bioperl(bl2seq) to compare whole 
>>genomewith single probe. 
>>I want to compare all the 200000 thousand probes. I am interested only
>>in mismatches, for this particular scenario my assumption is that more
>>than 90% of them will match. I want to send only the mismatches to
>>output file and discard the matches. I would like to classify the
>>mismatches based on the percentage dissimilarity, is there a way in
>>Bioperl for this? Thanks a lot for the reply. Please help me with 
>>this.Thanks
>>Usha
>>
>>
>>----- Original Message -----
>>From: Barry Moore <barry.moore@genetics.utah.edu>
>>Date: Monday, August 22, 2005 11:45 pm
>>Subject: Re: [Bioperl-l] bl2seq
>>
>>    
>>
>>>Usha,
>>>
>>>The best advice I can give you is that you need to focus your 
>>>question a 
>>>bit more.  What method are you using to compare your probe to 
>>>      
>>>
>>your 
>>    
>>
>>>fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your 
>>>      
>>>
>>sequence 
>>    
>>
>>>is 
>>>working fine for single sequence.  Are you using Bioperl for 
>>>      
>>>
>>that?  
>>    
>>
>>>Can 
>>>you tell us exactly what isn't working for you or what questions 
>>>you 
>>>have about working with multiple sequences?  Are you already 
>>>      
>>>
>>using 
>>    
>>
>>>Bioperl with your single sequence comparison? Can you show us 
>>>      
>>>
>>some 
>>    
>>
>>>code?
>>>Barry
>>>
>>>Usha Rani Reddi wrote:
>>>
>>>      
>>>
>>>>Hi,
>>>>I am trying to compare two hundred thousand probes(each one of 
>>>>        
>>>>
>>>them) to
>>>      
>>>
>>>>another genome. Format of the file containing probes is like this
>>>>SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>>>>NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>>>>NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>>>NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>>NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
>>>>NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
>>>>NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
>>>>NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
>>>>NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
>>>>What I am trying to do is compare PROBE_SEQUENCE to fasta file of
>>>>Streptococcus agalactiae. I am trying to loop through the probes 
>>>>        
>>>>
>>>but not
>>>      
>>>
>>>>sure how to proceed. My program is working fine for single 
>>>>        
>>>>
>>>sequence. One
>>>      
>>>
>>>>more thing is I am not interested in matches, I want to display 
>>>>        
>>>>
>>only> >mismatches. I am new to Bioperl, some one please help me 
>>with this.
>>    
>>
>>>>Thanks
>>>>Usha
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l@portal.open-bio.org
>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>
>>>>        
>>>>
>>>-- 
>>>Barry Moore
>>>Dept. of Human Genetics
>>>University of Utah
>>>Salt Lake City, UT
>>>
>>>
>>>
>>>
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>    
>>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

From barry.moore at genetics.utah.edu  Wed Aug 24 08:47:10 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed Aug 24 08:36:38 2005
Subject: [Bioperl-l] Strange result
In-Reply-To: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S>
References: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S>
Message-ID: <430C6C4E.2090504@genetics.utah.edu>

Show us the code that produced this result.

Barry

thechans@citiz.net wrote:

>Hello,
>I am new to Bioperl. I want to just copy some sequences out of an existing genbank file(multiple seq included).
>The selected sequences is still gb format and put to a new file.
>I received no warning and it worked well but the new file showed something strange.
>e.g,
>					/country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)"
>                     /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)"
>                     /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)"
>                     /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)"
>                     /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)"
>why it happened?
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From birney at ebi.ac.uk  Wed Aug 24 08:55:11 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Wed Aug 24 08:45:02 2005
Subject: [Bioperl-l] Strange result
In-Reply-To: <430C6C4E.2090504@genetics.utah.edu>
References: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S>
	<430C6C4E.2090504@genetics.utah.edu>
Message-ID: <430C6E2F.7000302@ebi.ac.uk>


Barry Moore wrote:
> Show us the code that produced this result.
> 
> Barry
> 
> thechans@citiz.net wrote:
> 
>> Hello,
>> I am new to Bioperl. I want to just copy some sequences out of an 
>> existing genbank file(multiple seq included).
>> The selected sequences is still gb format and put to a new file.
>> I received no warning and it worked well but the new file showed 
>> something strange.
>> e.g,
>>                     
>> /country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)"
>>                     
>> /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)"
>>                     
>> /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)"
>>                     
>> /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)"
>>                     
>> /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)"
>> why it happened?
>>

And the version. This is a 1.5x bug due to the skew that
happened in the ontology/embl/genbank thingy. I think if you
went back to an earlier version of bioperl you'd be fine.


>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>  
>>
> 
From agathman at semo.edu  Wed Aug 24 09:44:47 2005
From: agathman at semo.edu (Allen Gathman)
Date: Wed Aug 24 09:34:50 2005
Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq
In-Reply-To: <33580922CBEEC846B473BAE124985DE0051ADE4B@xchgnt.semo.edu>
Message-ID: <33580922CBEEC846B473BAE124985DE0030BD156@xchgnt.semo.edu>

Well, I appear to have fixed it myself, although in a kind of inelegant way
-- I pulled the display_id of the predicted gene I wanted, then used it in a
get_feature_by_name call to pull the feature out again.  

@new_genes=$db->get_feature_by_name( Sequence => $gid);
$ngene = shift (@new_genes);
etc. 

That "re-captured" feature ("$ngene" above) splices correctly when I use
spliced_seq on it.  I'm still a bit puzzled why the original code doesn't
get me the whole gene. 

Allen Gathman
http://cstl-csm.semo.edu/gathman


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-
> bounces@portal.open-bio.org] On Behalf Of Gathman, Allen
> Sent: Tuesday, August 23, 2005 1:39 PM
> To: 'bioperl-l@portal.open-bio.org'
> Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq
> 
> Hi, BioPerl gurus:
> 
> 
> 
> Although this question involves a Gbrowse database, I think it's actually
> a
> BioPerl question at heart - and in any case it appears that there's a lot
> of
> overlap between the people who answer questions in both lists, so I'm
> guessing this is a good place for this question.
> 
> 
> 
> I've written a script that finds particular pfam hits in a GBROWSE
> database,
> then uses "overlapping_features" to find predicted gene features of  type
> "transcript:GLEAN" that overlap those pfams. I've set the aggregator
> "transcript" as {CDS/mRNA} already.  I select features using a regular
> expression to choose particular names, then I use spliced_seq to return
> the
> spliced CDS of each feature - but I'm only getting back the CDS that
> actually overlap the pfam hit, not the full predicted gene.  So my
> question
> is, what do I need to do in order to get ALL the CDS of each predicted
> gene
> feature spliced together, instead of only the ones that actually overlap
> the
> pfam hit I used to select that predicted gene?
> 
> 
> 
> Thanks in advance for any help you can give...
> 
> 
> 
> Here's the code:
> 
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use strict;
> 
> use Bio::DB::GFF;
> 
> use Bio::Seq;
> 
> use Bio::SeqIO;
> 
> use Getopt::Long;
> 
> 
> 
> my $outfile;
> 
> GetOptions(
> 
>            'o|outfile=s' => \$outfile,
> 
>            );
> 
> 
> 
> my $outfa= Bio::SeqIO -> new (-file => ">$outfile",
> 
>                               -format => 'Fasta'
> 
>                              );
> 
> 
> 
> my $db      = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
> 
>                                -dsn        =>
> 'dbi:mysql:database=cc;host=localhost',
> 
>                                -fasta      => '/gbrowse/databases/cc'
> 
>                                );
> 
> 
> 
> $db->add_aggregator('transcript{CDS/mRNA}');
> 
> 
> 
>      for (my $i =1; $i<=20; $i++){
> 
>       my $pfamname="Peptidase_C$i";
> 
>         my @pfams = $db->get_feature_by_name( Domain => $pfamname);
> 
>         foreach my $pfamhit (@pfams){
> 
>              my $desc = $pfamname;
> 
>              my $score=$pfamhit->score;
> 
>              my $name = $pfamhit->name;
> 
>              $desc.= " $score ";
> 
>              $desc.= $pfamhit->location->seq_id();
> 
>              $desc.= ": ";
> 
> #
> 
> # Here's where I'm selecting predicted genes that overlap the Pfam hit
> 
> #
> 
>              my @genes = $db -> overlapping_features(
> 
>                                    -refseq => $pfamhit->location->seq_id,
> 
>                                    -start => $pfamhit->start,
> 
>                                    -stop => $pfamhit->stop,
> 
>                                    -types =>'transcript:GLEAN'
> 
>                                    );
> 
> #
> 
> # Now I'm choosing the ones with names I want out of the selected genes
> 
> #
> 
>                   foreach my $gene (@genes){
> 
>                        my $gid=$gene->display_id();
> 
>                        if ($gid =~/aug_GLEAN/){
> 
>                             $desc.=$gene->start;
> 
>                             $desc.=" - ";
> 
>                             $desc.=$gene->stop;
> 
> #
> 
> # Here I'm splicing the gene, tacking on a description, and outputting it.
> 
> #
> 
> 
> 
>                             my $splseq = $gene->spliced_seq();
> 
>                             $splseq->desc($desc);
> 
>                             $splseq->display_id($gid);
> 
>                             $outfa->write_seq($splseq);
> 
> 
> 
>                        }# end if aug_GLEAN
> 
>                   }# end foreach gene
> 
>          }# end foreach pfamhit
> 
>    } # end for numbers
> 
> close OUT;
> 
> 
> 
> 
> 
> Allen Gathman
> 
> http://cstl-csm.semo.edu/gathman
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed Aug 24 12:16:02 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Aug 24 12:05:55 2005
Subject: [Bioperl-l] Strange result
In-Reply-To: <430C6E2F.7000302@ebi.ac.uk>
References: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S>
	<430C6C4E.2090504@genetics.utah.edu> <430C6E2F.7000302@ebi.ac.uk>
Message-ID: <2d060be7d860518ef3b7aca0dffebfb8@gmx.net>

Right. Either downgrade to 1.4 or upgrade to a CVS snapshot of the main 
trunk.

	-hilmar

On Aug 24, 2005, at 5:55 AM, Ewan Birney wrote:

>
>
> Barry Moore wrote:
>> Show us the code that produced this result.
>> Barry
>> thechans@citiz.net wrote:
>>> Hello,
>>> I am new to Bioperl. I want to just copy some sequences out of an 
>>> existing genbank file(multiple seq included).
>>> The selected sequences is still gb format and put to a new file.
>>> I received no warning and it worked well but the new file showed 
>>> something strange.
>>> e.g,
>>>                     
>>> /country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)"
>>>                     
>>> /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)"
>>>                     
>>> /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)"
>>>                     
>>> /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)"
>>>                     
>>> /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)"
>>> why it happened?
>>>
>
> And the version. This is a 1.5x bug due to the skew that
> happened in the ontology/embl/genbank thingy. I think if you
> went back to an earlier version of bioperl you'd be fine.
>
>
>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Wed Aug 24 12:39:21 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Aug 24 12:29:49 2005
Subject: [Bioperl-l] Re: Thanks
In-Reply-To: <430C6848.4010905@genetics.utah.edu>
References: <1072d891078847.10788471072d89@emich.edu>
	<430C6848.4010905@genetics.utah.edu>
Message-ID: <013d943156f0fc995505c405201b11fd@gmx.net>

Thanks Barry for this excellent answer. It couldn't have been written  
better. -hilmar

On Aug 24, 2005, at 5:30 AM, Barry Moore wrote:

> Usha-
>
> It is important that you keep Bioperl related discussions on the  
> bioperl list, that way others can benefit from the discussion in the  
> future by searching the archives.  Having said that, I am constantly  
> accidentally replying directly to people and not to the list because I  
> hit reply instead of reply all, so I'm not really a good one to talk.
>
> It seems from nature of your questions to this list that you might be  
> quite new to perl programming.  You participation on the list is still  
> welcome, but this is the kind of problem that you want to learn how to  
> solve yourself.  You want to to avoid at all cost giving the  
> impression that you are asking the list to do your debugging for you   
> - otherwise people will just stop replying to your messages.  An very  
> valuable article about asking questions to forums like this one is  
> found at http://www.catb.org/~esr/faqs/smart-questions.html#answers  
> <http://www.catb.org/%7Eesr/faqs/smart-questions.html#answers>.  Read  
> it.  Live it.  O.K.  enough preaching...
>
> If you have Programming Perl  
> <http://www.amazon.com/exec/obidos/tg/detail/-/0596000278/ 
> qid=1124883861/sr=8-1/ref=pd_bbs_1/102-6339742-6061723? 
> v=glance&s=books&n=507846> read the first half of chapter 20.  If you  
> don't have the Perl book, you can get the same info from  
> http://www.perldoc.com/perl5.8.4/pod/perldebug.html.  After you  
> understand the perl debugger (or if you already do) look at the error  
> message that you got.  In the first line it reports "seq doesn't  
> validate".  So something is wrong with some sequence that you are  
> trying to use in your script.  Since perl itself isn't' aware of  
> sequences, and since the stack trace below shows that the exception  
> occurred while in Bio::PrimarySeq you can conclude that some sequence  
> that your are sending to bioperl is bad.  Now fire up perl running  
> your script with the debugger like this perl -d yourscript.pl.  Use  
> 'n' to step through your code to the open command on line 14.  No  
> errors yet?  You've just loaded your sequences.fasta sequence, so that  
> sequence must be OK.  Continue stepping through your code until you  
> get the error again.  Where exactly did the error occur?  When you try  
> to set $input2 as a new Bio::Seq object?  Run the debugger again, and  
> step through to the line just before where the error occurred.  Use  
> the debugger's x command to see what the values of  $seq_id,  
> $probe_id, $position, $probe_sequence are.  This should give you a  
> clue s to what your problem is.  One more clue comes from the error  
> message.  It says "Attempting to set the sequence to [PROBE_SEQUENCE]  
> which does not look healthy".  The error message says that you are  
> trying to set the sequence to PROBE_SEQUENCE.  Try to figure out why  
> this error is occurring and how to solve it.  If you're still stuck  
> let us know what you've tried and ask us again.
> Barry
>
>
> Usha Rani Reddi wrote:
>
>> Hi,
>> Thanks a lot for your help. When I tried to run the given code I got  
>> the
>> following message.
>>
>> MSG: seq doesn't validate, mismatch is 1
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION  -------------
>> MSG: Attempting to set the sequence to [PROBE_SEQUENCE] which does not
>> look healthy
>> STACK Bio::PrimarySeq::seq
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:268
>> STACK Bio::PrimarySeq::new
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:217
>> STACK Bio::Seq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:498
>> STACK toplevel barr:23
>>
>> What should I do next? Please help me.
>> Thanks
>> Usha.
>>
>> ----- Original Message -----
>> From: Barry Moore <bmoore@genetics.utah.edu>
>> Date: Tuesday, August 23, 2005 2:40 pm
>> Subject: [Bioperl-l] RE: Local bl2seq
>>
>>
>>> Usha-
>>>
>>> I think the code below will wrap your existing code in the loop you
>>> need.  You will want to get a copy of a good perl programming book  
>>> likeProgramming Perl from O'Reilly.  It will help you out with all  
>>> thoselittle perl details like loop structures etc.
>>>
>>> Barry
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Seq;
>>>
>>> my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta",
>>>                               -format => "fasta" );
>>>
>>> my $seq_obj = $seqio_obj->next_seq;
>>>
>>> open (IN, " location/of/your/probe/file") or die "Can't open IN";
>>>
>>> while (my $row = <IN>) {
>>>   chomp $row;
>>>   #Assuming your file is tab delimited...
>>>   my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/,
>>> $row;
>>>
>>>   my $input2 = Bio::Seq->new(-id=>"testquery2",
>>>                              -seq=> $probe_sequence
>>>                           );
>>>
>>>   my $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>> 'blastn',
>>>                                                       'outfile' =>
>>> 'bl2seq1.out');
>>>
>>>   my $blast_report = $factory->bl2seq($seq_obj, $input2);
>>>
>>>   #Here is where you want to throw out good matches.  You'll need to
>>> determine
>>>   #what method you want to do that.  Maybe since you want there to be
>>> no good
>>>   #hits you would just call $blast_report->max_significance and make
>>> sure it's
>>>   #value is too high to be significant.
>>>   if ($blast_report->max_significance > 0.01) {
>>>       print "$row\n";
>>>   }
>>> }
>>>
>>> -----Original Message-----
>>> From: Usha Rani Reddi [mailto:ureddi@emich.edu] Sent: Tuesday,  
>>> August 23, 2005 5:51 AM
>>> To: Barry Moore
>>> Cc: bioperl-l@portal.open-bio.org
>>> Subject: Local bl2seq
>>>
>>> Hi,
>>> I am trying to use BLAST to compare the sequences. I did the program  
>>> in
>>> Bioperl. Below is my piece of code
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Seq;
>>> $seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
>>>                            -format => "fasta" );
>>> $seq_obj = $seqio_obj->next_seq;
>>> $input2 = Bio::Seq->new(-id=>"testquery2",
>>>                                 
>>> -seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
>>>         $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>> 'blastn','outfile' => 'bl2seq1.out');
>>> $blast_report = $factory->bl2seq($seq_obj, $input2);
>>>
>>> I need help for looping input2. I want to extract this part of  
>>> sequencefrom a file containing 200000 records. Using perl I am  
>>> extracting the
>>> sequence part for file of format given below.
>>> SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>>> NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>>> NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>> NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>
>>> code for extracting PROBE_SEQUENCE looks like this
>>>
>>> $NemSeq =<STDIN>;
>>>
>>> chomp $NemSeq;
>>>
>>> unless (open(seqfile, $NemSeq)) {
>>> print "Cannot open file \n";
>>> exit;
>>> }
>>> @NemSeq = <seqfile> ;
>>>
>>> close seqfile;
>>>
>>> for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
>>>   #print $k, $NemSeq[$k];
>>>   @Nem =split(/\t/,$NemSeq[$k]);
>>>   $input= $Nem[3];
>>>
>>>   #print scalar(@Nem);
>>>   #print $Nem[3], "\n";
>>>   }
>>>
>>>
>>> @Nem =split(/\t/,$NemSeq)
>>>
>>> $input2 = substr(@NemSeq,4,32);
>>>
>>> So far I could successfully use bioperl(bl2seq) to compare whole  
>>> genomewith single probe. I want to compare all the 200000 thousand  
>>> probes. I am interested only
>>> in mismatches, for this particular scenario my assumption is that  
>>> more
>>> than 90% of them will match. I want to send only the mismatches to
>>> output file and discard the matches. I would like to classify the
>>> mismatches based on the percentage dissimilarity, is there a way in
>>> Bioperl for this? Thanks a lot for the reply. Please help me with  
>>> this.Thanks
>>> Usha
>>>
>>>
>>> ----- Original Message -----
>>> From: Barry Moore <barry.moore@genetics.utah.edu>
>>> Date: Monday, August 22, 2005 11:45 pm
>>> Subject: Re: [Bioperl-l] bl2seq
>>>
>>>
>>>> Usha,
>>>>
>>>> The best advice I can give you is that you need to focus your  
>>>> question a bit more.  What method are you using to compare your  
>>>> probe to
>>> your
>>>> fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your
>>> sequence
>>>> is working fine for single sequence.  Are you using Bioperl for
>>> that?
>>>> Can you tell us exactly what isn't working for you or what  
>>>> questions you have about working with multiple sequences?  Are you  
>>>> already
>>> using
>>>> Bioperl with your single sequence comparison? Can you show us
>>> some
>>>> code?
>>>> Barry
>>>>
>>>> Usha Rani Reddi wrote:
>>>>
>>>>
>>>>> Hi,
>>>>> I am trying to compare two hundred thousand probes(each one of
>>>> them) to
>>>>
>>>>> another genome. Format of the file containing probes is like this
>>>>> SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>>>>> NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>>>>> NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>>>> NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>>> NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
>>>>> NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
>>>>> NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
>>>>> NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
>>>>> NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
>>>>> What I am trying to do is compare PROBE_SEQUENCE to fasta file of
>>>>> Streptococcus agalactiae. I am trying to loop through the probes
>>>> but not
>>>>
>>>>> sure how to proceed. My program is working fine for single
>>>> sequence. One
>>>>
>>>>> more thing is I am not interested in matches, I want to display
>>> only> >mismatches. I am new to Bioperl, some one please help me with  
>>> this.
>>>
>>>>> Thanks
>>>>> Usha
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> -- 
>>>> Barry Moore
>>>> Dept. of Human Genetics
>>>> University of Utah
>>>> Salt Lake City, UT
>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From jason.stajich at duke.edu  Wed Aug 24 13:11:29 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug 24 13:01:14 2005
Subject: [Bioperl-l] 1.5.1 todo list
Message-ID: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu>

So after those exchanges on what version of Bioperl needs to be run,  
and the various floating bugs in the release soup....

If we had to make a list of showstoppers for 1.5.1 release what would  
they be?

To explicitly state, I think the purpose of 1.5.1 release should
a) clean up know showstopper bugs in 1.5.0
b) allow new modules/functionality to be introduced since 1.4 and 1.5.0
c) be preparing the way for 1.6 release by putting code out in the  
wilds.

I am fairly adamant API changes that are not backwards compatible  
need to be CAREFULLY thought out before being allowed in.  Since the  
code base is so big at this point, there need to be good tests in  
place to confirm this, and a responsibility from the developers to  
make sure this is the case.


My hope is that Gbrowse (live) could be successfully run on a 1.5.1  
as I feel that is largest 'external' consumer of Bioperl, with BioSQL  
and of course everyone's scripts which use a handful of modules.

What is the status of bioperl code for:
  Ontology work
  BioSQL support (from the Core code at least, how much in sync would  
1.5.1 be with biosql-perl release?)
  Bio::FeatureIO stuff + Bio::SeqFeature changes?
  Bio::DB::GFF work?  the GFF3 schema would be way past 1.5.1, but is  
that something we'd want to shoot for in 1.6?
  Other things?

Please report in.  Times like this sort of make me want a Wiki so we  
can keep track but I'll at least volunteer to collate the results  
into a summary email.


-jason

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From arne.nolte at uni-koeln.de  Wed Aug 24 09:32:16 2005
From: arne.nolte at uni-koeln.de (Arne Nolte)
Date: Wed Aug 24 16:21:13 2005
Subject: [Bioperl-l] maximum likelihood estimation
Message-ID: <000501c5a8b0$3eac31f0$8db35f86@tautzarne>

Dear all,

I would like to calculate calculate maximum likelihood estimators given a
likelihood function and some parameters.

are there tools available to do this using perl?

thanks,

Arne


Arne Nolte
Institute for Genetics
Evolutionary Genetics
Z?lpicher Str. 47
50674 Cologne
Germany

Tel.: 0221/470-4034
Fax.: 0221/470-5975

From birney at ebi.ac.uk  Thu Aug 25 04:59:23 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Thu Aug 25 04:48:55 2005
Subject: [Bioperl-l] 1.5.1 todo list
In-Reply-To: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu>
References: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu>
Message-ID: <430D886B.2050603@ebi.ac.uk>


Jason Stajich wrote:
> So after those exchanges on what version of Bioperl needs to be run,  
> and the various floating bugs in the release soup....
> 
> If we had to make a list of showstoppers for 1.5.1 release what would  
> they be?
> 
> To explicitly state, I think the purpose of 1.5.1 release should
> a) clean up know showstopper bugs in 1.5.0
> b) allow new modules/functionality to be introduced since 1.4 and 1.5.0
> c) be preparing the way for 1.6 release by putting code out in the  wilds.
> 
> I am fairly adamant API changes that are not backwards compatible  need 
> to be CAREFULLY thought out before being allowed in.  Since the  code 
> base is so big at this point, there need to be good tests in  place to 
> confirm this, and a responsibility from the developers to  make sure 
> this is the case.
> 
> 
> My hope is that Gbrowse (live) could be successfully run on a 1.5.1  as 
> I feel that is largest 'external' consumer of Bioperl, with BioSQL  and 
> of course everyone's scripts which use a handful of modules.
> 
> What is the status of bioperl code for:
>  Ontology work
>  BioSQL support (from the Core code at least, how much in sync would  
> 1.5.1 be with biosql-perl release?)
>  Bio::FeatureIO stuff + Bio::SeqFeature changes?

I've wrriten my interface class for TypedSeqFeature but not done
an implementation yet. I'll commit the interface and work on
an implementation.


>  Bio::DB::GFF work?  the GFF3 schema would be way past 1.5.1, but is  
> that something we'd want to shoot for in 1.6?
>  Other things?
> 
> Please report in.  Times like this sort of make me want a Wiki so we  
> can keep track but I'll at least volunteer to collate the results  into 
> a summary email.
> 
> 
> -jason
> 
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From ed at compbio.berkeley.edu  Thu Aug 25 05:41:37 2005
From: ed at compbio.berkeley.edu (Ed Green)
Date: Thu Aug 25 05:31:12 2005
Subject: [Bioperl-l] maximum likelihood estimation
In-Reply-To: <000501c5a8b0$3eac31f0$8db35f86@tautzarne>
References: <000501c5a8b0$3eac31f0$8db35f86@tautzarne>
Message-ID: <430D9251.1050303@compbio.berkeley.edu>

Arne-
I quickly searched CPAN for "maximum likelihood" and "MLE" and found 
nothing relevant.

If you are also a C programmer, you may be interested in the GNU 
Scientific Library (GSL)
http://www.gnu.org/software/gsl/

GSL has nicely written and documented code for multidimensional 
minimization that may be useful for you:
http://www.gnu.org/software/gsl/manual/gsl-ref_35.html#SEC460

I have used these functions to solve a problem like you've described, so 
I could provide more information, (off this list since it wouldn't 
really be bioperl-y or even perl-y) if you're interested.

Regards,

Ed Green
Max Planck Institute for
Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig
Germany

Arne Nolte wrote:

>Dear all,
>
>I would like to calculate calculate maximum likelihood estimators given a
>likelihood function and some parameters.
>
>are there tools available to do this using perl?
>
>thanks,
>
>Arne
>
>
>Arne Nolte
>Institute for Genetics
>Evolutionary Genetics
>Z?lpicher Str. 47
>50674 Cologne
>Germany
>
>Tel.: 0221/470-4034
>Fax.: 0221/470-5975
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

From lstein at cshl.edu  Wed Aug 24 17:30:16 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu Aug 25 10:40:08 2005
Subject: [Gmod-gbrowse] Re: [Bioperl-l] Windows bug in Bio::DB::Fasta?
In-Reply-To: <6.2.1.2.2.20050823150328.04bba998@express.cites.uiuc.edu>
References: <1124116511.2891.9.camel@localhost.localdomain>
	<200508221818.08032.lstein@cshl.edu>
	<6.2.1.2.2.20050823150328.04bba998@express.cites.uiuc.edu>
Message-ID: <200508241730.18892.lstein@cshl.edu>

Glad it fixed the problem. Much thanks to Scott who correctly diagnosed the 
problem.

Lincoln

On Tuesday 23 August 2005 04:03 pm, Chris Fields wrote:
> That did the trick!  Everything looks fine now.  Thanks Lincoln!
>
> Chris
>
> At 05:18 PM 8/22/2005, Lincoln Stein wrote:
> >I've just looked into this. The bug occurs when Windows opens the FASTA
> > file in text mode rather than binary mode; when in text mode the "\r\n"
> > sequence is invisibly mapped to "\n" during readline operations, so
> > Bio::DB::Fasta thinks that it is dealing with a Unix-format file; then
> > when the module tries to seek() to the proper line number, Windows
> > doesn't do the line end mapping, so it seeks to the wrong offset.  (sound
> > of hairs being pulled)
> >
> >I've fixed the problem by explicitly calling binmode() on all filehandles
> >that
> >Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS
> > and the gbrowse 1.63 CVS version. It ought to fix Chris' GC content
> > weirdness.
> >
> >Lincoln
> >
> >On Monday 15 August 2005 01:22 pm, Scott Cain wrote:
> > > Just to follow up on my own email with a little more information: in
> > > Fasta.pm, line 697:
> > >
> > >   $termination_length ||= /\r\n$/ ? 2 : 1;  # account for
> > > crlf-terminated Windows files
> > >
> > > The pattern match is failing on DOS formatted files; I don't know why.
> > > Does anyone else?
> > >
> > > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote:
> > > > Hello all,
> > > >
> > > > I am investigating a bug in GBrowse that seems to only surface when
> > > > people are using the memory (ie, file) adaptor on Windows systems.
> > > > Here's the bug report:
> > > >
> > > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&
> > > >grou p_id=27707
> > > >
> > > > I've tracked the problem down to Bio::DB::Fasta when the file is dos
> > > > formatted (that is, it has both line feeds and carriage returns), BDF
> > > > returns the wrong string when a subsequence is requested, but when
> > > > the file is unix formatted (ie only CR (or is it only LF?)), it
> > > > returns the right string.  I wrote the very simple test script below
> > > > and stepped it through the perl debugger.  It looks like the bug is
> > > > in the caloffset method, as it returns the same offsets regardless of
> > > > the file type, which then makes the subsequent seek into the file go
> > > > to the wrong coordinates of dos formatted files.
> > > >
> > > > Unfortunately, I don't really know what is going on caloffset, so I
> > > > don't know how to fix it, but it presumably has to check the format
> > > > of the file somewhere and take that into account.
> > > >
> > > > Thanks,
> > > > Scott
> >
> >--
> >Lincoln D. Stein
> >Cold Spring Harbor Laboratory
> >1 Bungtown Road
> >Cold Spring Harbor, NY 11724
> >FOR URGENT MESSAGES & SCHEDULING,
> >PLEASE CONTACT MY ASSISTANT,
> >SANDRA MICHELSEN, AT michelse@cshl.edu
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From lstein at cshl.edu  Thu Aug 25 11:24:12 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu Aug 25 11:13:51 2005
Subject: [Bioperl-l] 1.5.1 todo list
In-Reply-To: <430D886B.2050603@ebi.ac.uk>
References: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu>
	<430D886B.2050603@ebi.ac.uk>
Message-ID: <200508251124.13683.lstein@cshl.edu>

> >  Bio::DB::GFF work?  the GFF3 schema would be way past 1.5.1, but is
> > that something we'd want to shoot for in 1.6?
> >  Other things?

I think Bio::DB::GFF3 will going in in the November/December time frame - 
probably December.

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From birney at ebi.ac.uk  Thu Aug 25 11:56:53 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Thu Aug 25 11:46:15 2005
Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail
Message-ID: <430DEA45.1060308@ebi.ac.uk>


TypedSeqFeatureI is in. Implementation coming.


Some tests are failing for me (not related to sequence
features)


Failed Test       Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/Index.t           79 20224    47    0   0.00%  ??
t/PAML.t                       166   14   8.43%  153-166
t/RestrictionIO.t               14    1   7.14%  10
t/SearchIO.t                  1227    4   0.33%  1224-1227
145 subtests skipped.
Failed 4/201 test scripts, 98.01% okay. 19/9625 subtests failed, 99.80% okay.


I'll dig around on these, but can't promise to sort them out.

From jason.stajich at duke.edu  Thu Aug 25 12:16:34 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Aug 25 12:06:10 2005
Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail
In-Reply-To: <430DEA45.1060308@ebi.ac.uk>
References: <430DEA45.1060308@ebi.ac.uk>
Message-ID: <372DA67D-B453-429E-B200-16E33805A141@duke.edu>

They all pass for me on OSX, and linux.  What version of perl?

Do you have IO::String installed?  I believe the last tests in PAML  
are parsing trees and assigning rates and parameters to branches.
more details there if you can't track it down.

I found that SearchIO was just a count getting set wrong for test  
count in when necessary XML modules are not installed.  fixed that.

-jason
On Aug 25, 2005, at 11:56 AM, Ewan Birney wrote:

>
> TypedSeqFeatureI is in. Implementation coming.
>
>
> Some tests are failing for me (not related to sequence
> features)
>
>
> Failed Test       Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> ---------
> t/Index.t           79 20224    47    0   0.00%  ??
> t/PAML.t                       166   14   8.43%  153-166
> t/RestrictionIO.t               14    1   7.14%  10
> t/SearchIO.t                  1227    4   0.33%  1224-1227
> 145 subtests skipped.
> Failed 4/201 test scripts, 98.01% okay. 19/9625 subtests failed,  
> 99.80% okay.
>
>
> I'll dig around on these, but can't promise to sort them out.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From kynn at panix.com  Thu Aug 25 12:18:59 2005
From: kynn at panix.com (kynn@panix.com)
Date: Thu Aug 25 12:08:23 2005
Subject: [Bioperl-l] [OT] General bioinformatics forums/lists?
Message-ID: <200508251618.j7PGIxZ09385@panix3.panix.com>


I have many questions that are about bioinformatics in general, not
BioPerl.  Is there a good bioinformatics list where I could post them?
I've Googled for one, but the lists I've found are specialized
(e.g. focusing on specific software), or don't seem to get much
traffic at all (or both).

Thanks!

kj

From bmoore at genetics.utah.edu  Thu Aug 25 12:35:14 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Thu Aug 25 12:22:49 2005
Subject: [Bioperl-l] DocBook Question
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863F4A@m.hg.genetics.utah.edu>

Brian-

 
I've been working on setting up a DocBook working environment so I can
take a stab at writing a bioperl HOWTO.  I'm using the e-novative
environment (http://www.e-novative.info/index.php) for transforming the
xml to pdf|html that the bioperl docs link to as the stylesheet source
for those documents.  My pdf output is all aligned against the left edge
of the document with no left margin.  Did you modify the e-novative
stylesheet to correct this.  I can't seem to fix that.  Also, what do
you use to create the plain old text files.  I see that you are using
RenderX for transformation. Is that better than the e-novative tools?
I'm new to all this xml/xsl/css etc. etc. any suggestions for how to
work best with the processing chain would be greatly appreciated.

 
Barry

 
Barry Moore

Department of Human Genetics

University of Utah

Salt Lake City, UT 84112


From birney at ebi.ac.uk  Thu Aug 25 12:34:45 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Thu Aug 25 12:24:27 2005
Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail
In-Reply-To: <372DA67D-B453-429E-B200-16E33805A141@duke.edu>
References: <430DEA45.1060308@ebi.ac.uk>
	<372DA67D-B453-429E-B200-16E33805A141@duke.edu>
Message-ID: <430DF325.3020306@ebi.ac.uk>


Jason Stajich wrote:
> They all pass for me on OSX, and linux.  What version of perl?
> 

[Ewan-Birneys-Computer:wise2/src/network] birney% perl -v

This is perl, v5.6.0 built for darwin

Copyright 1987-2000, Larry Wall


> Do you have IO::String installed?  I believe the last tests in PAML are 
> parsing trees and assigning rates and parameters to branches.
> more details there if you can't track it down.
> 


I have got IO::String installed. I'll dig.


> I found that SearchIO was just a count getting set wrong for test count 
> in when necessary XML modules are not installed.  fixed that.
> 

Great.


> -jason
> On Aug 25, 2005, at 11:56 AM, Ewan Birney wrote:
> 
>>
>> TypedSeqFeatureI is in. Implementation coming.
>>
>>
>> Some tests are failing for me (not related to sequence
>> features)
>>
>>
>> Failed Test       Stat Wstat Total Fail  Failed  List of Failed
>> -------------------------------------------------------------------------------
>> t/Index.t           79 20224    47    0   0.00%  ??
>> t/PAML.t                       166   14   8.43%  153-166
>> t/RestrictionIO.t               14    1   7.14%  10
>> t/SearchIO.t                  1227    4   0.33%  1224-1227
>> 145 subtests skipped.
>> Failed 4/201 test scripts, 98.01% okay. 19/9625 subtests failed, 
>> 99.80% okay.
>>
>>
>> I'll dig around on these, but can't promise to sort them out.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org <mailto:Bioperl-l@portal.open-bio.org>
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> --
> 
> Jason Stajich
> 
> jason.stajich at duke.edu
> 
> http://www.duke.edu/~jes12/
> 
> 
> 
From hlapp at gmx.net  Thu Aug 25 12:49:37 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu Aug 25 12:39:19 2005
Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail
In-Reply-To: <430DF325.3020306@ebi.ac.uk>
References: <430DEA45.1060308@ebi.ac.uk>
	<372DA67D-B453-429E-B200-16E33805A141@duke.edu>
	<430DF325.3020306@ebi.ac.uk>
Message-ID: <590c379d723a9438a5c760e0048e85a1@gmx.net>


On Aug 25, 2005, at 9:34 AM, Ewan Birney wrote:

> Jason Stajich wrote:
>> They all pass for me on OSX, and linux.  What version of perl?
>
> [Ewan-Birneys-Computer:wise2/src/network] birney% perl -v
>
> This is perl, v5.6.0 built for darwin
>
> Copyright 1987-2000, Larry Wall
>

I do suggest you upgrade perl. I know 5.6.0 is the one that comes with 
Jaguar, but it has bugs in some features bioperl is taking advantage of 
(nested regex in FTLocationFactory being just one example). I've had so 
much trouble to get Bioperl pass all tests with errors nobody else was 
getting that I finally gave up and upgraded (to Panther actually, but 
upgrading perl supposedly suffices). Once I did that most failures went 
away (and some new ones came up but that's another story and they are 
fixed meanwhile).

I brought this up a while ago in spring and I can dig up the list 
thread if you're interested. The conclusion was that essentially we'll 
have to require perl 5.6.1 with the next release.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From jason.stajich at duke.edu  Thu Aug 25 13:00:34 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Aug 25 12:50:04 2005
Subject: [Bioperl-l] Bio::DB::GFF start/end coordinates
Message-ID: <D479BA98-53E4-4024-A53F-441A09F65F89@duke.edu>

Lincoln -

One bug I'm still seeing in Bio::DB::GFF::Feature objects is start/ 
end are still returning start > end when strand < 0.   I know this is  
different expectation for Bioperl / Gbrowse but this causes a little  
problems, especially when you get an aggregated feature out from  
Bio::DB:GFF and then write it to a genbank file.  The locations looks  
like this:
complement(join(1031..975,676..501))

My workaround is just to create new Location objects and features  
from the Bio::DB::GFF obtained objects  (some of these aren't  
allowing write-back to overwrite the values).

Note on a slightly separate topic:
  I have patched my Bio::Location::Split to_FTstring to simplify the  
string, current behavior would be to output the location like this:
join(complement(1031..975),complement(676..501),))

I'm seeing about how applying the patch, I'm not sure whether or not  
it perfectly works.


-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From sm_middha at yahoo.com  Thu Aug 25 13:27:05 2005
From: sm_middha at yahoo.com (sumit middha)
Date: Thu Aug 25 13:17:45 2005
Subject: [Bioperl-l] [OT] General bioinformatics forums/lists?
In-Reply-To: <200508251618.j7PGIxZ09385@panix3.panix.com>
Message-ID: <20050825172706.46755.qmail@web30709.mail.mud.yahoo.com>


Even I want info. on a good bioinfo discussion forum,
where people discuss their doubts about some software,
tool .. or their research question and figuring out
best way to plan things, etc ...

Thanks.

--- kynn@panix.com wrote:

> 
> 
> 
> I have many questions that are about bioinformatics
> in general, not
> BioPerl.  Is there a good bioinformatics list where
> I could post them?
> I've Googled for one, but the lists I've found are
> specialized
> (e.g. focusing on specific software), or don't seem
> to get much
> traffic at all (or both).
> 
> Thanks!
> 
> kj
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
From MAG at Stowers-Institute.org  Thu Aug 25 13:45:34 2005
From: MAG at Stowers-Institute.org (Goel, Manisha)
Date: Thu Aug 25 13:35:06 2005
Subject: [Bioperl-l] [OT] General bioinformatics forums/lists?
Message-ID: <200508251734.j7PHYvTv002473@portal.open-bio.org>

How about https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
?
Or
https://bioinformatics.org/mailman/listinfo/ssml-general .. 

Maybe you cuold look at their archives to see if the topics discussed
here suit your purpose.


-Manisha
Post-doc Associate,
Stowers Institute for Medical Research
Kansas city, MO


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of sumit middha
Sent: Thursday, August 25, 2005 12:27 PM
To: kynn@panix.com; bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] [OT] General bioinformatics forums/lists?


Even I want info. on a good bioinfo discussion forum,
where people discuss their doubts about some software,
tool .. or their research question and figuring out
best way to plan things, etc ...

Thanks.

--- kynn@panix.com wrote:

> 
> 
> 
> I have many questions that are about bioinformatics
> in general, not
> BioPerl.  Is there a good bioinformatics list where
> I could post them?
> I've Googled for one, but the lists I've found are specialized
> (e.g. focusing on specific software), or don't seem
> to get much
> traffic at all (or both).
> 
> Thanks!
> 
> kj
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From astew at wam.umd.edu  Thu Aug 25 17:08:32 2005
From: astew at wam.umd.edu (Andrew Stewart)
Date: Thu Aug 25 16:58:00 2005
Subject: [Bioperl-l] ->add_tag_value()
Message-ID: <430E3350.50604@wam.umd.edu>

I'm trying to create genbank files from a sequence file and features 
retreived from glimmer output.  When creating the new features and 
writing them to the genbank (richseq) object, though, I get the 
following output (for example)...

     CDS             21..596
                     
/translation="Bio::Annotation::SimpleValue=HASH(0x987b78)"
     CDS             complement(1713..2903)
                     
/translation="Bio::Annotation::SimpleValue=HASH(0x987944)"
     CDS             complement(3236..4258)
                     
/translation="Bio::Annotation::SimpleValue=HASH(0x9be8e4)"
     CDS             4350..5936
                     
/translation="Bio::Annotation::SimpleValue=HASH(0x9bead0)"
     CDS             6181..6819
                     
/translation="Bio::Annotation::SimpleValue=HASH(0x9bebd8)"

The translation tag I added is for some reason being shown as a hash.  
The code in question is here...

    my $translation = $seqo->subseq($start, $stop);
    $feat->add_tag_value("translation",$translation);

Everything there is ok, as far as I can tell.  I was able to spit the 
'translation' tag back out with some test code just fine.  Near as I can 
tell, either $feat->add_tag_value() is setting the tag value as a 
reference, or the tag value is being retreived as such when the feature 
is written to the seq object (or somewhere else in the process).

Anyone have any idea what might be going on here?


-Andrew Stewart
BDRD
From jason.stajich at duke.edu  Thu Aug 25 17:32:17 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Aug 25 17:23:33 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <430E3350.50604@wam.umd.edu>
References: <430E3350.50604@wam.umd.edu>
Message-ID: <A362C63D-80BB-4EAE-AD09-7D5BAD2464E7@duke.edu>

It has been discussed several times on the mailing list.  A  
deficiency in the code released as 1.5.0 but should be fixed in CVS  
now.  If it isn't can you please yell loudly so it gets fixed by the  
people who broke it... =)

  http://portal.open-bio.org/pipermail/bioperl-l/2005-April/018749.html

-jason
On Aug 25, 2005, at 5:08 PM, Andrew Stewart wrote:

> I'm trying to create genbank files from a sequence file and  
> features retreived from glimmer output.  When creating the new  
> features and writing them to the genbank (richseq) object, though,  
> I get the following output (for example)...
>
>     CDS             21..596
>                     /translation="Bio::Annotation::SimpleValue=HASH 
> (0x987b78)"
>     CDS             complement(1713..2903)
>                     /translation="Bio::Annotation::SimpleValue=HASH 
> (0x987944)"
>     CDS             complement(3236..4258)
>                     /translation="Bio::Annotation::SimpleValue=HASH 
> (0x9be8e4)"
>     CDS             4350..5936
>                     /translation="Bio::Annotation::SimpleValue=HASH 
> (0x9bead0)"
>     CDS             6181..6819
>                     /translation="Bio::Annotation::SimpleValue=HASH 
> (0x9bebd8)"
>
> The translation tag I added is for some reason being shown as a  
> hash.  The code in question is here...
>
>    my $translation = $seqo->subseq($start, $stop);
>    $feat->add_tag_value("translation",$translation);
>
> Everything there is ok, as far as I can tell.  I was able to spit  
> the 'translation' tag back out with some test code just fine.  Near  
> as I can tell, either $feat->add_tag_value() is setting the tag  
> value as a reference, or the tag value is being retreived as such  
> when the feature is written to the seq object (or somewhere else in  
> the process).
>
> Anyone have any idea what might be going on here?
>
>
> -Andrew Stewart
> BDRD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From indapa at gmail.com  Thu Aug 25 17:36:10 2005
From: indapa at gmail.com (Amit Indap)
Date: Thu Aug 25 17:26:02 2005
Subject: [Bioperl-l] DBI connection parameters and BasePersistenceAdaptor.pm
Message-ID: <3cfaa40405082514364c4b5835@mail.gmail.com>

Hi,

Thanks for the response on the the bioperl-db API, Hilmar. I have a
much better understanding now.

I have a script that adds features to bioentries in a biosql database
(namely where bioentries align to the human genome via parsinga blat
file).  But its having trouble connecting to my mysql db when I call
my $dbseq= $adp->find_by_unique_key($seq);
(where $seq holds my Bio::Seq object to which I want to add features
to) The stack is listed at the end of the msg.

I would like to add features to this sequence and then store them in
my biosql database while  encapsulating this process using bioperl-db
API. Clearly, it can't connect to my mysql server  The particular line
in  BasePersistenceAdaptor.pm it flames out on is:
$dbh=$dbc->dbi()->get_connection($dbc,$dbc->dbi()->conn_params($self))


Elswhere in my code I have a low-level query for my biosql db using
DBI in which I connect to mysql reading a .my.cnf file:

my $conn = DBI->connect("DBI:mysql:amit" .
";mysql_read_default_file=/home/amit/.my.cnf", $user, $passwd);

Is there a way for to tell bioperl to read this .my.cnf file when it
makes its database connection? For some reason to open a mysql
connection on my machine i need to open up a ssh -L connection to the
machine where the mysql server lives  with some funky parameters. (If
this is more appropriate for biosql mailiing list, apologies but I
didn't want to cross post :)

Amit Indap
Cornell University

------------- EXCEPTION  -------------
MSG: failed to open connection: Access denied for user
'amit'@'132.236.170.104' (using password: NO)
STACK Bio::DB::DBI::base::new_connection
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:253
STACK Bio::DB::DBI::base::get_connection
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:213
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477
STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BaseDriver.pm:515
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK Bio::DB::BioSQL::PrimarySeqAdaptor::get_unique_key_query
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:395
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:845
STACK toplevel /home/amit/bin/HCG-blatparser.pl:181

From allenday at ucla.edu  Thu Aug 25 17:43:46 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu Aug 25 17:33:08 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <430E3350.50604@wam.umd.edu>
References: <430E3350.50604@wam.umd.edu>
Message-ID: <Pine.LNX.4.58.0508251443380.23819@sumo.ctrl.ucla.edu>

This is fixed in CVS.

-Allen

On Thu, 25 Aug 2005, Andrew Stewart wrote:

> I'm trying to create genbank files from a sequence file and features 
> retreived from glimmer output.  When creating the new features and 
> writing them to the genbank (richseq) object, though, I get the 
> following output (for example)...
> 
>      CDS             21..596
>                      
> /translation="Bio::Annotation::SimpleValue=HASH(0x987b78)"
>      CDS             complement(1713..2903)
>                      
> /translation="Bio::Annotation::SimpleValue=HASH(0x987944)"
>      CDS             complement(3236..4258)
>                      
> /translation="Bio::Annotation::SimpleValue=HASH(0x9be8e4)"
>      CDS             4350..5936
>                      
> /translation="Bio::Annotation::SimpleValue=HASH(0x9bead0)"
>      CDS             6181..6819
>                      
> /translation="Bio::Annotation::SimpleValue=HASH(0x9bebd8)"
> 
> The translation tag I added is for some reason being shown as a hash.  
> The code in question is here...
> 
>     my $translation = $seqo->subseq($start, $stop);
>     $feat->add_tag_value("translation",$translation);
> 
> Everything there is ok, as far as I can tell.  I was able to spit the 
> 'translation' tag back out with some test code just fine.  Near as I can 
> tell, either $feat->add_tag_value() is setting the tag value as a 
> reference, or the tag value is being retreived as such when the feature 
> is written to the seq object (or somewhere else in the process).
> 
> Anyone have any idea what might be going on here?
> 
> 
> -Andrew Stewart
> BDRD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From mayagao1999 at yahoo.com  Thu Aug 25 22:53:05 2005
From: mayagao1999 at yahoo.com (Alex Zhang)
Date: Thu Aug 25 22:42:28 2005
Subject: [Bioperl-l] How to generate negative pair for motif binding sites
	in Perl?
Message-ID: <20050826025305.44078.qmail@web53504.mail.yahoo.com>

Hi, all! 

I have a problem in using Perl to
make 100 negative pair for motif binding 
sites. Would anybody give me some suggestions?
Thank you very much ahead of time.

Alex

The description of the problem:

To generate negative dependent pair binding sites for
motif

1.	We can get 16 combinations of any 2 nucleotides.
They are:
AA, AT, AC, AG,
TT, TC, TG, TA,
CC, CT, CG, CA,
GG, GC, GT and GA

For example, if we say pair ??AA?? is a positive
dependent pair, which means that ??A?? always comes
with
another ??A?? across many sequences with probability
x%.
In other words, it looks like:
??????????????
????AA??????.
????AA??????.
????AA??????.
????AA??????.
????AA??????.
????AA??????.
????????????..

In contrast to positive pair, the negative pair ??AG??
looks like in some sequences:
????????????...
????A??????...
????A??????..
????A??????...
????..G??????.
????..G??????.
????..G??????.
????????????...

           Which means that ??A?? is less likely to be
with ??G?? across these sequences than other
nucleotides
G, T, C. But if we count the frequency of each
nucleotide along the column, we can find that the
??A??
and ??G?? have the highest frequencies in its columns.
By generating 4 negative pairs, we can end up with
motif binding sites of length 8. Finally we are going
to make 100 binding sites.

2. (1) Randomly pick 4 pairs from the 16 combinations
which will be used as ??negative pairs?? in the
sequences. For example, we get pairs AG, CT, CT, GG.
    (2) Suppose the probability for each negative pair
is 70%. In the 100 binding sites, we let the all the
1st nucleotides be A with probability 70%. In other
words, there are 70 As in the 100 binding sites on the
1st positions. 
If 1st position is A, then 2nd position will be G with
probability 57% and A or C or T with probability
(1-0.57)/3;
If 1st position is not A, then let 2nd position be G
automatically;
(3) Repeat this for other three negative pairs. 

3.	Generally speaking, we have negative pair XY.
(a)	let 1st nucleotides in 100 sites be X with
probability 70% and other with probability 10%
(b)	if 1st nucleotide = X, then let 2nd nucleotide in
100 sites be Y with probability 57% and other with
probability (1-57%)/3;
(c)	Else, let 2nd nucleotide in 100 sites be Y
automatically;
(d)	Repeat (a) (b) (c) for other three pairs.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From bmoore at genetics.utah.edu  Fri Aug 26 01:15:04 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Fri Aug 26 01:03:09 2005
Subject: [Bioperl-l] RE: Bioperl-l Digest, Vol 28, Issue 10
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863F55@m.hg.genetics.utah.edu>

Ping,

 
I am sorry I don't understand your question.  It looks like your e-mail
might have contained a graphic or an attachment that I couldn't view.
Do you want to try your question again, and send it to the bioperl list
in case someone else understands it better than I do.

 
Barry

 
-----Original Message-----
From: Ping Yao [mailto:sdshlxh@gmail.com] 
Sent: Wednesday, August 24, 2005 4:59 PM
To: Barry Moore
Subject: Re: Bioperl-l Digest, Vol 28, Issue 10

 
Hi,Barry:
             You gave very good suggetion.
             I am also a new user in perl.
             I try your code by myself and met  the following problem.
 <file:///C:\DOCUME~1\LIU-YAO\LOCALS~1\TEMP\moz-screenshot.jpg>
<file:///C:\DOCUME~1\LIU-YAO\LOCALS~1\TEMP\moz-screenshot-1.jpg> Stack
toplevel  in following
                                                                      
               my $blast_report = $factory->bl2seq($seq_obj, $input2);


           In fact I met the same Stack toplevel in other bioperl
program .

            Could you give me some explain about it.

         How to fix Stack toplevel  ?


Ping YAO 


          Univ. of Missouri-Columbia


From hlapp at gmx.net  Fri Aug 26 03:38:22 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Aug 26 03:35:40 2005
Subject: [Bioperl-l] 1.5.1 todo list
In-Reply-To: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu>
References: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu>
Message-ID: <fa6c4f3f2a2cb095c2874dd8bf919add@gmx.net>


On Aug 24, 2005, at 10:11 AM, Jason Stajich wrote:

> [...]
> What is the status of bioperl code for:
>  Ontology work

The issues with the goflat parser and Ontology.pm loading Graph.pm 
should be fixed. Due to lack of time I haven't been able yet to iron 
out the last wrinkles with the goperl-bridge I wrote, so .obo format is 
not yet supported. I'd really like this to be in 1.6.x though.

>  BioSQL support (from the Core code at least, how much in sync would 
> 1.5.1 be with biosql-perl release?)

I guess you mean bioperl-db? Bioperl-db works with the CVS main trunk 
of bioperl, all tests pass when run against bioperl-live.

>  Bio::FeatureIO stuff + Bio::SeqFeature changes?

The overloads seem to work currently but generally make me feel uneasy 
because they can lead to very subtle and hard to track down bugs and 
should be earmarked for roll back. One could choose to ignore this 
though for 1.5.1 (as opposed to 1.6.x).

>  Bio::DB::GFF work?  the GFF3 schema would be way past 1.5.1, but is 
> that something we'd want to shoot for in 1.6?
>  Other things?

A comprehensive and authoritative set of tests for the SeqFeatureI API 
still needs to be written so that any future f*ups in this area are 
readily and immediately detected. This would then also be the set of 
tests that blesses (or holds up) the 1.6.0 release code. Again, 
although I gave it priority one could choose to ignore it for 1.5.1.

	-hilmar

>
> Please report in.  Times like this sort of make me want a Wiki so we 
> can keep track but I'll at least volunteer to collate the results into 
> a summary email.
>
>
> -jason
>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From birney at ebi.ac.uk  Fri Aug 26 04:48:58 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Fri Aug 26 05:58:19 2005
Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail
In-Reply-To: <590c379d723a9438a5c760e0048e85a1@gmx.net>
References: <430DEA45.1060308@ebi.ac.uk>
	<372DA67D-B453-429E-B200-16E33805A141@duke.edu>
	<430DF325.3020306@ebi.ac.uk>
	<590c379d723a9438a5c760e0048e85a1@gmx.net>
Message-ID: <430ED77A.4090403@ebi.ac.uk>


Hilmar Lapp wrote:
> 
> On Aug 25, 2005, at 9:34 AM, Ewan Birney wrote:
> 
>> Jason Stajich wrote:
>>
>>> They all pass for me on OSX, and linux.  What version of perl?
>>
>>
>> [Ewan-Birneys-Computer:wise2/src/network] birney% perl -v
>>
>> This is perl, v5.6.0 built for darwin
>>
>> Copyright 1987-2000, Larry Wall
>>
> 
> I do suggest you upgrade perl. I know 5.6.0 is the one that comes with 
> Jaguar, but it has bugs in some features bioperl is taking advantage of 
> (nested regex in FTLocationFactory being just one example). I've had so 
> much trouble to get Bioperl pass all tests with errors nobody else was 
> getting that I finally gave up and upgraded (to Panther actually, but 
> upgrading perl supposedly suffices). Once I did that most failures went 
> away (and some new ones came up but that's another story and they are 
> fixed meanwhile).
> 
> I brought this up a while ago in spring and I can dig up the list thread 
> if you're interested. The conclusion was that essentially we'll have to 
> require perl 5.6.1 with the next release.
> 

Ok. I will still dig to find out how many we can fix in 5.6.0 --- I
am sure I wont be the last person to use it with Bioperl.


>     -hilmar

From jason.stajich at duke.edu  Fri Aug 26 11:52:30 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Aug 26 11:42:12 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <430F374B.7060408@wam.umd.edu>
References: <430E3350.50604@wam.umd.edu>
	<A362C63D-80BB-4EAE-AD09-7D5BAD2464E7@duke.edu>
	<430F1E81.6080400@wam.umd.edu>
	<44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu>
	<430F374B.7060408@wam.umd.edu>
Message-ID: <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu>

It is several files if I remember correctly not just one.  I don't  
know exactly which ones.  Better ask on the list as the folks who  
made the changes can say better.

You CAN empirically figure out what has changed via some CVS trickery.

CHECK OUT BIOPERL-LIVE FROM CVS
$ cvs -d:pserver:cvs@cvs.open-bio.org:/home/repository/bioperl co  
bioperl-live
$ cd bioperl-live

SEE THE CHANGES THAT WERE MADE IN THE DIRECTORIES I THINK HAVE CHANGED
$ cvs diff -r bioperl-release-1-5-0 Bio/SeqFeature
$ cvs diff -r bioperl-release-1-5-0 Bio/Annotation
$ cvs diff -r bioperl-release-1-5-0 Bio/SeqFeatureI.pm
$ cvs diff -r bioperl-release-1-5-0 Bio/SeqIO

You can use these commands to generate a patch file you can then  
apply to bioperl-1.5.0


-jason

On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote:

> Would it be possible to simply update the module which contains the  
> error (or are there multiple files?) rather than downgrade to 1.4  
> or upgrade to the HEAD branch?
> -Andrew
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gmx.net  Fri Aug 26 12:24:37 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Aug 26 12:16:10 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu>
References: <430E3350.50604@wam.umd.edu>
	<A362C63D-80BB-4EAE-AD09-7D5BAD2464E7@duke.edu>
	<430F1E81.6080400@wam.umd.edu>
	<44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu>
	<430F374B.7060408@wam.umd.edu>
	<6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu>
Message-ID: <8c31cb4c665092173967567f87b35a34@gmx.net>


> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote:
>
>> Would it be possible to simply update the module which contains the 
>> error (or are there multiple files?) rather than downgrade to 1.4 or 
>> upgrade to the HEAD branch?
>> -Andrew

You could, e.g. using Jason's suggestion, but I don't know why you 
wouldn't just want to upgrade to the main trunk. Currently, this is as 
close as you can get to upgrading to 1.5.1., which is what you will 
want to do anyway immediately once it's out.

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From qfdong at iastate.edu  Fri Aug 26 15:26:30 2005
From: qfdong at iastate.edu (Qunfeng)
Date: Fri Aug 26 15:18:47 2005
Subject: [Bioperl-l] bug report - SeqIO::genbank.pm
In-Reply-To: <4171441D.6030502@utk.edu>
References: <GAEDKMGOKFBLJPKCLKCCOEIFECAA.brian_osborne@cognia.com>
	<4171441D.6030502@utk.edu>
Message-ID: <6.1.2.0.2.20050826140610.03f6f138@qfdong.mail.iastate.edu>

Hi there,

Sorry I am not sure where to report bioperl bug and whether this bug has 
been reported before. So I am just going to send it to the bioperl list.

The "_read_GenBank_Species" function in SeqID::genbank.pm generates an 
exception when parsing GenBank record GI#66271013, which has an unusual 
ORGANISM name "(Populus tomentosa x P. bolleana) x P. tomentosa var. 
truncata". Notice there is a "(" in the beginning.  That "(" will be 
treated as an opening "(" for regular expression (see line 8 below) and can 
be fixed by a simple escaping (see line 7 below).


=================================================
sub _read_GenBank_Species{                              ...
   elsif (/^\s{2}ORGANISM/o) {
      my @spflds = split(' ', $_);
       ($ns_name) = $_ =~ /\w+\s+(.*)/o;
       shift(@spflds); # ORGANISM
       $spflds[0] =~ 
s/\(/\\\(/;                                          #(7)  escape the ( by \(
       if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) {  #(8)  it 
causes exception with "(Populus"
           $organelle = shift(@spflds);
       }
       ...
}
=================================================

Qunfeng

From brian_osborne at cognia.com  Fri Aug 26 15:55:34 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Aug 26 15:45:02 2005
Subject: [Bioperl-l] bug report - SeqIO::genbank.pm
In-Reply-To: <6.1.2.0.2.20050826140610.03f6f138@qfdong.mail.iastate.edu>
Message-ID: <BF34EBF6.3C1A%brian_osborne@cognia.com>

Qunfeng,

In the future please submit bugs at http://bugzilla.bioperl.org/.

Right now I'll just take a look at this without a formal bug report, thanks
for the submission.

Brian O.


On 8/26/05 3:26 PM, "Qunfeng" <qfdong@iastate.edu> wrote:

> Hi there,
> 
> Sorry I am not sure where to report bioperl bug and whether this bug has
> been reported before. So I am just going to send it to the bioperl list.
> 
> The "_read_GenBank_Species" function in SeqID::genbank.pm generates an
> exception when parsing GenBank record GI#66271013, which has an unusual
> ORGANISM name "(Populus tomentosa x P. bolleana) x P. tomentosa var.
> truncata". Notice there is a "(" in the beginning.  That "(" will be
> treated as an opening "(" for regular expression (see line 8 below) and can
> be fixed by a simple escaping (see line 7 below).
> 
> 
> =================================================
> sub _read_GenBank_Species{                              ...
>    elsif (/^\s{2}ORGANISM/o) {
>       my @spflds = split(' ', $_);
>        ($ns_name) = $_ =~ /\w+\s+(.*)/o;
>        shift(@spflds); # ORGANISM
>        $spflds[0] =~
> s/\(/\\\(/;                                          #(7)  escape the ( by \(
>        if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) {  #(8)  it
> causes exception with "(Populus"
>            $organelle = shift(@spflds);
>        }
>        ...
> }
> =================================================
> 
> Qunfeng
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From astew at wam.umd.edu  Fri Aug 26 16:08:23 2005
From: astew at wam.umd.edu (Andrew Stewart)
Date: Fri Aug 26 15:58:15 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <8c31cb4c665092173967567f87b35a34@gmx.net>
References: <430E3350.50604@wam.umd.edu>
	<A362C63D-80BB-4EAE-AD09-7D5BAD2464E7@duke.edu>
	<430F1E81.6080400@wam.umd.edu>
	<44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu>
	<430F374B.7060408@wam.umd.edu>
	<6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu>
	<8c31cb4c665092173967567f87b35a34@gmx.net>
Message-ID: <430F76B7.7010803@wam.umd.edu>

Do many of you use bioperl-live as your primary (or exclusive) BioPerl 
distribution, or do you keep a stable version as well?

I'd like to use bioperl-live for instances such as this (see message 
history), but not necessarily for when I'm developing scripts that are 
going to be used by others in my lab who do not necessarily have 
bioperl-live installed.

What I'm thinking is that I should maybe install a copy of bioperl-live 
somewhere in my personal space, and then 'use' it in certain scripts 
when needed.  I just have a few questions (these are probably more 
'perl' questions than 'bio-perl' questions)...

1. Once I obtain bioperl-live via 'cvs -d :pserver etc...', do I need to 
actually go through the install routine or can I just access the modules 
from where they are downloaded?

and

2. Would I then place code at the header of my script such as...

use lib "/path/to/bioperl-live";
use MODULE;

and the updated module will (temporarily) override the other bioperl 
modules in my @INC?

I tried this, actually, without any noticable change in my previous problem
(_print_GenBank... still prints the feature tabs as 
/tab="Bio::Annotation::SimpleValue=HASH(0x87f0914)" instead of 
/tab="value"), but I don't know for certain if perl was using the 
modules from my bioperl-live installation or the older ones.


-Andrew Stewart


Hilmar Lapp wrote:

>
>> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote:
>>
>>> Would it be possible to simply update the module which contains the 
>>> error (or are there multiple files?) rather than downgrade to 1.4 or 
>>> upgrade to the HEAD branch?
>>> -Andrew
>>
>
> You could, e.g. using Jason's suggestion, but I don't know why you 
> wouldn't just want to upgrade to the main trunk. Currently, this is as 
> close as you can get to upgrading to 1.5.1., which is what you will 
> want to do anyway immediately once it's out.
>
>     -hilmar
>

From fiedler at cshl.edu  Thu Aug 25 18:18:27 2005
From: fiedler at cshl.edu (Tristan Fiedler)
Date: Fri Aug 26 16:26:16 2005
Subject: [Bioperl-l] DocBook Question
In-Reply-To: <200508252133.j7PLXlTu005861@portal.open-bio.org>
References: <200508252133.j7PLXlTu005861@portal.open-bio.org>
Message-ID: <4176cd6f03d312ccfa4ba37508c71ee2@cshl.edu>

Hi Barry,

I am using DocBook for a project (http://www.WormBook.org ) based on 
the following software pipeline :

DocBook to HTML : Saxon 6.5.3
FO to PDF : FOP 0.20.5
XML : DocBook XML 4.4CR2
XSL : DocBook XSL 1.67.2

These all work quite well together, although FOP is getting outdated.  
I would recommend trying the new RenderX instead of FOP.

For page margins, check out :

http://www.sagehill.net/docbookxsl/PrintOutput.html#LeftRightMargins
http://www.sagehill.net/docbookxsl/PageDesign.html

Cheers,        Tristan
---
Tristan J. Fiedler
Postdoctoral Fellow - Stein Lab
Cold Spring Harbor Laboratory

From hlapp at gmx.net  Fri Aug 26 16:37:26 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Aug 26 16:27:32 2005
Subject: [Bioperl-l] DBI connection parameters
In-Reply-To: <3cfaa40405082514364c4b5835@mail.gmail.com>
References: <3cfaa40405082514364c4b5835@mail.gmail.com>
Message-ID: <e4eedafcfd4c9b27462a7560d3963a3f@gmx.net>


On Aug 25, 2005, at 2:36 PM, Amit Indap wrote:

> [...]  But its having trouble connecting to my mysql db when I call
> my $dbseq= $adp->find_by_unique_key($seq);
> (where $seq holds my Bio::Seq object to which I want to add features
> to) The stack is listed at the end of the msg.
>

The stack says:

> MSG: failed to open connection: Access denied for user
> 'amit'@'132.236.170.104' (using password: NO)

Can you connect using the mysql shell as the above user from machine  
132.236.170.104 without using a password? You supply the password to  
BioDB->new() using the -pass option.

> [...]
> Elswhere in my code I have a low-level query for my biosql db using
> DBI in which I connect to mysql reading a .my.cnf file:
>
> my $conn = DBI->connect("DBI:mysql:amit" .
> ";mysql_read_default_file=/home/amit/.my.cnf", $user, $passwd);
>
> Is there a way for to tell bioperl to read this .my.cnf file when it
> makes its database connection?

No, not until now. I added an option (-dsn) that lets you specify the  
dsn to be used verbatim for connecting. It should propagate to the  
anonymous cvs server over the next 1-2 hours. You can now also specify  
this option (--dsn) to load_{seqdatabase,ontology}.pl.

There is also an option -initrc that lets you specify a file that  
evaluates to a hash ref with all the parameters as keys. Check out the  
POD for Bio::DB::BioDB->new(). I also exposed this option (--initrc)  
now in load_{seqdatabase,ontology}.pl, apparently I had forgotten to do  
this before.

	-hilmar

 From the respective POD section I wrote on --initrc:

        --initrc paramfile
          Instead of, or in addition to, specifying every individual  
database
          connection parameter you may put them into a file that when  
read by
          perl evaluates to an array or hash reference. This option  
specifies
          the file to read; the special value DEFAULT (or no value) will  
use a
          file ./.bioperldb or $HOME/.bioperldb, whichever is found  
first in
          that order.

          Constructing a file that evaluates to a hash reference is very  
sim-
          ple. The first non-space character needs to be an open curly  
brace,
          and the last non-space character a closing curly brace. In  
between
          the curly braces, write option name, followed by => (equal to  
or
          greater than), followed by the value in single quotes.  
Separate each
          such option/value pair by comma. Here is an example:

          {
              -dbname => 'mybiosql', -host => 'foo.bar.edu', -user =>  
'cleo' }

          Line breaks and white space don't matter (except if in the  
value
          itself). Also note that options only have a single dash as  
prefix,
          and they need to be those accepted by Bio::DB::BioDB->new()
          (Bio::DB::BioDB) or Bio::DB::SimpleDBContext->new()  
(Bio::DB::Sim-
          pleDBContext). Those sometimes differ slightly from the option  
names
          used by this script, e.g., --dbuser corresponds to -user.

          Note also that using the above example, you can use it for  
--initrc
          and still connect as user caesar by also supplying --dbuser  
caesar on
          the command line. I.e., command line arguments override any  
parame-
          ters also found in the initrc file.

          Finally, note that if using this option with default file name  
and
          the default file is not found at any of the default locations,  
the
          option will be ignored; it is not considered an error.


>  For some reason to open a mysql
> connection on my machine i need to open up a ssh -L connection to the
> machine where the mysql server lives  with some funky parameters. (If
> this is more appropriate for biosql mailiing list, apologies but I
> didn't want to cross post :)
>
> Amit Indap
> Cornell University
>
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Access denied for user
> 'amit'@'132.236.170.104' (using password: NO)
> STACK Bio::DB::DBI::base::new_connection
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:253
> STACK Bio::DB::DBI::base::get_connection
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:213
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BaseDriver.pm:515
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:855
> STACK Bio::DB::BioSQL::PrimarySeqAdaptor::get_unique_key_query
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:395
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:845
> STACK toplevel /home/amit/bin/HCG-blatparser.pl:181
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From wes.barris at csiro.au  Fri Aug 26 00:37:42 2005
From: wes.barris at csiro.au (Wes Barris)
Date: Fri Aug 26 16:30:17 2005
Subject: [Bioperl-l] How do I tell what version of bioperl is installed?
Message-ID: <430E9C96.3080601@csiro.au>

Hi,

I am trying to install gbrowse which requires bioperl-1.5.  I am getting
a warning from the gbrowse installation that says:

Warning: prerequisite Bio::Perl 1.5 not found. We have unknown version.

The thing is that I have bioperl-1.5 installed.  How do I verify this?
Normally, I use this script to list installed modules and their versions
but it does not report a version for bioperl:

#!/usr/bin/perl
use ExtUtils::Installed;
my $instmod = ExtUtils::Installed->new();
foreach my $module ($instmod->modules()) {
    my $version = $instmod->version($module) || "???";
    print "$module -- $version\n";
    }

wes@bioweb> ~/proj/perl/installed.pl
Authen::Krb5::Simple -- 0.31
Bio -- ???
GD -- 2.19
GD::SVG -- 0.25
Generic-Genome-Browser -- ???
HTTPD-User-Manage -- ???
IO::String -- 1.06
MD5 -- 2.03
Perl -- 5.8.5
SVG -- 2.32
SynBrowse -- ???
Text::Shellwords -- 1.07
mod_perl -- 1.29

-- 
Wes Barris
E-Mail: Wes.Barris@csiro.au
From jason.stajich at duke.edu  Fri Aug 26 17:02:12 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Aug 26 16:51:39 2005
Subject: [Bioperl-l] How do I tell what version of bioperl is installed?
In-Reply-To: <430E9C96.3080601@csiro.au>
References: <430E9C96.3080601@csiro.au>
Message-ID: <6D9F04B3-1D30-4BFA-93FF-2122DAD08D70@duke.edu>

I think the warning is extraneous and is something lincoln later  
fixed in what gbrowse is parsing.  It wasn't properly detecting the  
version.
you can tell by doing this for any module ( Bio::SeqIO for example)
$ perl -MBio::SeqIO -e 'print "$Bio::SeqIO::VERSION\n";'

But this is a runtime thing while MakeMaker is actually parsing the  
file to try and figure out the version which doesn't quite work.
I believe he posted about a workaround and was updating the gbrowse  
code to be able to handle it.

The thread starts here:
http://portal.open-bio.org/pipermail/bioperl-l/2005-August/019495.html

-jason

On Aug 26, 2005, at 12:37 AM, Wes Barris wrote:

> Hi,
>
> I am trying to install gbrowse which requires bioperl-1.5.  I am  
> getting
> a warning from the gbrowse installation that says:
>
> Warning: prerequisite Bio::Perl 1.5 not found. We have unknown  
> version.
>
> The thing is that I have bioperl-1.5 installed.  How do I verify this?
> Normally, I use this script to list installed modules and their  
> versions
> but it does not report a version for bioperl:
>
> #!/usr/bin/perl
> use ExtUtils::Installed;
> my $instmod = ExtUtils::Installed->new();
> foreach my $module ($instmod->modules()) {
>    my $version = $instmod->version($module) || "???";
>    print "$module -- $version\n";
>    }
>
> wes@bioweb> ~/proj/perl/installed.pl
> Authen::Krb5::Simple -- 0.31
> Bio -- ???
> GD -- 2.19
> GD::SVG -- 0.25
> Generic-Genome-Browser -- ???
> HTTPD-User-Manage -- ???
> IO::String -- 1.06
> MD5 -- 2.03
> Perl -- 5.8.5
> SVG -- 2.32
> SynBrowse -- ???
> Text::Shellwords -- 1.07
> mod_perl -- 1.29
>
> -- 
> Wes Barris
> E-Mail: Wes.Barris@csiro.au
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gnf.org  Fri Aug 26 17:21:46 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Aug 26 17:13:22 2005
Subject: [Bioperl-l] ontology paths in Bioperl-DB / Biosql
Message-ID: <7fdc960cb1f74a59d2014129a38bceb6@gnf.org>

One thing I forgot to report to the list is that last Friday I fixed 
the Bioperl-db adaptor and driver module for ontology paths in Biosql 
to include the distance zero paths when computing the transitive 
closure over an ontology.

There are now also tests in t/12ontology.t that check for those 
distance zero paths. They pass on all three supported platforms (mysql, 
Pg, Oracle).

(load_ontology.pl in bioperl-db/scripts/biosql has an option 
--computetc that if supplied  will automatically recompute the 
transitive closure over the just loaded ontology)

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From limericksean at gmail.com  Mon Aug 29 11:03:01 2005
From: limericksean at gmail.com (Sean O'Keeffe)
Date: Mon Aug 29 10:52:17 2005
Subject: [Bioperl-l] Bio::Search results
Message-ID: <46278464050829080368d707be@mail.gmail.com>

Hi,
The following code snippet is something I use to extract information
from hmmer result files:

use Bio::SearchIO;

my $in = new Bio::SearchIO( -format => 'hmmer',  -file => $ARGV[0] );
while(my $result = $in->next_result) {
  print $result->query_name(), "\n",$result->query_description(),"\n";
  while (my $hit = $result->next_hit) {
    while(my $hsp = $hit->next_domain) {
      next unless ($hsp->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
      print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
    }
  }
}

The input file is generated by hmmpfam and is given at the command
line. I use it to scan for specific domain names e.g ig, fn3 lrr etc.
This code works for the first loop and then ends so I get the name and
description (no hsp values as their are none for this result):

ENSMUSP00000065602 
pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059
transcript:ENSMUST00000066517

My question is why does the loop end after one instance. Incidentally
the outputted  name and description above are the last ones in the
hmmer file (maybe the file is read from the back? - don't know if this
means anything).
Any thoughts would be appreciated. Thanks,
Sean.

From jason.stajich at duke.edu  Mon Aug 29 12:18:03 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Aug 29 12:07:45 2005
Subject: [Bioperl-l] If you use RemoteBlast
References: <69BA0F938FAC6A4CBEF49461720696F20A6B6200@nihexchange16.nih.gov>
Message-ID: <78ADB5F4-31DB-4FEA-8515-CBC403B6A1FC@duke.edu>

So those of you who use the Tools::RemoteBlast module, please read  
the following email.    We need some people to test out how much the  
parser breaks with the "new" formatter. Tests seem to pass right now,  
but I don't know if that is because the 'old' format is being  
requested still.  Could someone please take a little time to see  
what's going on and report back.

Thanks,
-jason

Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI)" <mcginnis@ncbi.nlm.nih.gov>
> Date: August 29, 2005 12:06:52 PM EDT
> To: "'jason@bioperl.org'" <jason@bioperl.org>
> Subject: New BLAST Formatter.
>
>
> Hello.
>
> The new BLAST formatter has been a default for a months now. But  
> we'd like
> to shut off the old one.
>
> Will this pose a problem?
>
> Thanks,
>
> Sincerely,
> Scott D. McGinnis, M.S.
> National Center for Biotechnology Information
> <http://www.ncbi.nlm.nih.gov>
>
>
> Blast-announce: New BLAST formatter at the NCBI
>
> A new version of the BLAST formatter has been the default on the  
> NCBI BLAST
> web pages for the past XX months.  On September 6, 2005 we will  
> remove the
> checkbox allowing users to select the old formatter and support for  
> the old
> formatter will be discontinued.
>
> This formatter has been rewritten from scratch using the NCBI C++  
> toolkit
> and includes many new features (see list below) as well as the  
> ability to
> fetch parts of genomic sequences when needed, making it much faster  
> than the
> old formatter for many queries.
>
> Please send questions or comments to blast-help@ncbi.nlm.nih.gov
>
>
> New features:
> --------------
>
> 1.) The new formatter will present the masked residues or bases as
> lower-case letters.  Additionally the masked letters can be shown  
> in color.
> To use this feature change the "Masking Character" to "Lower case"  
> on the
> formatting page and select a "Masking Color".
> Example:
>
> http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi? 
> CMD=Get&RID=1098448824-15725-370
> 06897750.BLASTQ4&NEW_FORMATTER=on&MASK_CHAR=2&MASK_COLOR=2&DESCRIPTION 
> S=0
>
>
> 2.) The "pairwise with identities" option allows easy  
> identification of a
> few mismatches among highly similar sequences. In this (pair-wise)  
> view
> mismatches, as well as "Sbjct" (on the line containing the  
> mismatch) are
> shown in red.
> To use this feature change the "Alignment view" to "Pairwise with
> identities" on the formatting page.
> Example:
>
> http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
> CMD=Get&NCBI_GI=yes&SHOW_OVERVIE
> W=on&ALIGNMENT_VIEW=PairwiseWithIdentities&NEW_FORMATTER=on&RID=111089 
> 2196-1
> 6209-7903412953.BLASTQ4#28302128
>
>
> 3.) For database sequences longer than 200,000 bases each alignment  
> has a
> header entitled "Features in this part of the subject sequence"
> listing CDS features on the database sequence within the alignment  
> range or
> at the 5' or 3' end if not features are within the range itself.
> This gives a quick description of what you are looking at as many long
> sequences have a standard defline such as "chromosome 16".
> Example:
>
> http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
> CMD=Get&NCBI_GI=yes&SHOW_OVERVIE
> W=on&NEW_FORMATTER=on&RID=1098455471-18771-167762343145.BLASTQ4#514656 
> 96
>
>
> Rewrites/bug fixes:
> -------------------
>
> 1.) The graphic overview has been rewritten; it now uses an HTML
> implementation.
>
> 2.) Query-anchored views now work with blastx/tblastn/tblastx, they  
> didn't
> before.
>
> 3.) phi-BLAST patterns are now also shown in the query-anchored view.
>


--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cain at cshl.edu  Mon Aug 29 12:34:58 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Aug 29 12:24:39 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] GTF-->GFF3 converter
In-Reply-To: <3a8da45805082614141e8fc27@mail.gmail.com>
References: <3a8da45805082614141e8fc27@mail.gmail.com>
Message-ID: <1125333298.2882.33.camel@localhost.localdomain>

Hi Etienne,

Probably the best mailing list to ask this question on is the bioperl
mailing list (cc'ed here). 

As far as I know, there is no script specifically to do that.  Because
GFF3 is more strict than GTF (aka GFF 2.5), it can be difficult to move
from GTF to GFF3.  If the Bio::FeatureIO::gff module were a little more
fleshed out, it would probably be able to do it, but currently, while it
will write GTF, it doesn't yet read it.  If you wanted to contribute
code to do that, that would be great.

The other possibility in the absence of Bio::FeatureIO::gff is
Bio::Tools::GFF, which should be able to parse GTF and then write
something resembling GFF3.  I wrote 'resembling' because you may need to
massage the output to actually get something that is GFF3.

Scott
 

On Fri, 2005-08-26 at 15:14 -0600, Etienne Noumen wrote:
> Hi,
> In our projects, our data are in GTF format. I wrote a script to
> convert it to GFF3 but there are tags like Feature ID, ProteinID that
> i don't know how to deal with. I am also concerned about grouping
> exons and CDS into mRNA and Genes. Is there any converter that does it
> well?
> 
> This is how my files look like:
> ............
> scaffold_10034	src	exon	7360	8354	.	-	.	name
> "fgenesh1_pg.C_scaffold_10034000001"; transcriptId 58482
> scaffold_10034	src	CDS	7360	8352	.	-	0	name
> "fgenesh1_pg.C_scaffold_10034000001"; proteinId 58482; exonNumber 1
> scaffold_10034	src	stop_codon	7360	7362	.	-	0	name
> "fgenesh1_pg.C_scaffold_10034000001"
> scaffold_10309	src	exon	5822	6042	.	+	.	name
> "fgenesh1_pg.C_scaffold_10309000001"; transcriptId 58526
> scaffold_10309	src	CDS	5822	6042	.	+	0	name
> "fgenesh1_pg.C_scaffold_10309000001"; proteinId 58526; exonNumber 1
> scaffold_10309	src	exon	7270	7612	.	+	.	name
> "fgenesh1_pg.C_scaffold_10309000001"; transcriptId 58526
> scaffold_10309	src	CDS	7270	7612	.	+	2	name
> "fgenesh1_pg.C_scaffold_10309000001"; proteinId 58526; exonNumber 2
> scaffold_10309	src	stop_codon	7610	7612	.	+	0	name
> "fgenesh1_pg.C_scaffold_10309000001"
> ...........
> Thank you.
> noumen
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From aleunpkc at gmail.com  Sun Aug 28 20:58:27 2005
From: aleunpkc at gmail.com (hong kong pm)
Date: Mon Aug 29 14:07:24 2005
Subject: [Bioperl-l] [OT] General bioinformatics forums/lists?
Message-ID: <7e4bd6e6050828175846474aa3@mail.gmail.com>

Do we need a bioinformatics forum though bioperl mailing list is already a 
good one? If we want to establish one, we need volunteers to work as forum 
moderators. I am willing to sponsor h/w, software license, and hosting if 
there is enough interest.
Andy

From palmeida at igc.gulbenkian.pt  Mon Aug 29 14:31:53 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Mon Aug 29 14:22:34 2005
Subject: [Bioperl-l] If you use RemoteBlast
In-Reply-To: <78ADB5F4-31DB-4FEA-8515-CBC403B6A1FC@duke.edu>
References: <69BA0F938FAC6A4CBEF49461720696F20A6B6200@nihexchange16.nih.gov>
	<78ADB5F4-31DB-4FEA-8515-CBC403B6A1FC@duke.edu>
Message-ID: <200508291931.54147.palmeida@igc.gulbenkian.pt>

I think it is indeed the new formatter that is being requested by 
Tools::RemoteBlast, since it is the default and Tools::RemoteBlast doesn't 
seem to change it (I'm using BioPerl 1.4).

There is this checkbox in Blast.cgi that controls this:

<input checked name="NEW_FORMATTER" type="checkbox">

I don't know if there are more complex ways in which the new formatter may 
break the parser, but I've been using Tools::RemoteBlast and didn't notice 
anything weird (my code is pretty much the same as the Synopsis).

-- Paulo

On Monday 29 August 2005 17:18, Jason Stajich wrote:
> So those of you who use the Tools::RemoteBlast module, please read
> the following email.    We need some people to test out how much the
> parser breaks with the "new" formatter. Tests seem to pass right now,
> but I don't know if that is because the 'old' format is being
> requested still.  Could someone please take a little time to see
> what's going on and report back.
>
> Thanks,
> -jason
>
> Begin forwarded message:
> > From: "Mcginnis, Scott (NIH/NLM/NCBI)" <mcginnis@ncbi.nlm.nih.gov>
> > Date: August 29, 2005 12:06:52 PM EDT
> > To: "'jason@bioperl.org'" <jason@bioperl.org>
> > Subject: New BLAST Formatter.
> >
> >
> > Hello.
> >
> > The new BLAST formatter has been a default for a months now. But
> > we'd like
> > to shut off the old one.
> >
> > Will this pose a problem?
> >
> > Thanks,
> >
> > Sincerely,
> > Scott D. McGinnis, M.S.
> > National Center for Biotechnology Information
> > <http://www.ncbi.nlm.nih.gov>
> >
> >
> > Blast-announce: New BLAST formatter at the NCBI
> >
> > A new version of the BLAST formatter has been the default on the
> > NCBI BLAST
> > web pages for the past XX months.  On September 6, 2005 we will
> > remove the
> > checkbox allowing users to select the old formatter and support for
> > the old
> > formatter will be discontinued.
> >
> > This formatter has been rewritten from scratch using the NCBI C++
> > toolkit
> > and includes many new features (see list below) as well as the
> > ability to
> > fetch parts of genomic sequences when needed, making it much faster
> > than the
> > old formatter for many queries.
> >
> > Please send questions or comments to blast-help@ncbi.nlm.nih.gov
> >
> >
> > New features:
> > --------------
> >
> > 1.) The new formatter will present the masked residues or bases as
> > lower-case letters.  Additionally the masked letters can be shown
> > in color.
> > To use this feature change the "Masking Character" to "Lower case"
> > on the
> > formatting page and select a "Masking Color".
> > Example:
> >
> > http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?
> > CMD=Get&RID=1098448824-15725-370
> > 06897750.BLASTQ4&NEW_FORMATTER=on&MASK_CHAR=2&MASK_COLOR=2&DESCRIPTION
> > S=0
> >
> >
> > 2.) The "pairwise with identities" option allows easy
> > identification of a
> > few mismatches among highly similar sequences. In this (pair-wise)
> > view
> > mismatches, as well as "Sbjct" (on the line containing the
> > mismatch) are
> > shown in red.
> > To use this feature change the "Alignment view" to "Pairwise with
> > identities" on the formatting page.
> > Example:
> >
> > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE
> > W=on&ALIGNMENT_VIEW=PairwiseWithIdentities&NEW_FORMATTER=on&RID=111089
> > 2196-1
> > 6209-7903412953.BLASTQ4#28302128
> >
> >
> > 3.) For database sequences longer than 200,000 bases each alignment
> > has a
> > header entitled "Features in this part of the subject sequence"
> > listing CDS features on the database sequence within the alignment
> > range or
> > at the 5' or 3' end if not features are within the range itself.
> > This gives a quick description of what you are looking at as many long
> > sequences have a standard defline such as "chromosome 16".
> > Example:
> >
> > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE
> > W=on&NEW_FORMATTER=on&RID=1098455471-18771-167762343145.BLASTQ4#514656
> > 96
> >
> >
> > Rewrites/bug fixes:
> > -------------------
> >
> > 1.) The graphic overview has been rewritten; it now uses an HTML
> > implementation.
> >
> > 2.) Query-anchored views now work with blastx/tblastn/tblastx, they
> > didn't
> > before.
> >
> > 3.) phi-BLAST patterns are now also shown in the query-anchored view.
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Paulo Almeida
Tel: +351 21 4464635, Fax: +351 21 4407970
Instituto Gulbenkian de Ci?ncia
Rua da Quinta Grande, 6
P-2780-156 Oeiras
Portugal
http://www.igc.gulbenkian.pt

From boris.steipe at utoronto.ca  Mon Aug 29 14:35:56 2005
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon Aug 29 14:27:46 2005
Subject: [Bioperl-l] [OT] General bioinformatics forums/lists?
In-Reply-To: <7e4bd6e6050828175846474aa3@mail.gmail.com>
References: <7e4bd6e6050828175846474aa3@mail.gmail.com>
Message-ID: <FACDCD3C-051C-4382-A928-7E04F59123BE@utoronto.ca>

Would the Bio_Bulletin_Board not work for you?
see:

http://bioinformatics.org/mailman/listinfo/

B.

On 28 Aug 2005, at 20:58, hong kong pm wrote:

> Do we need a bioinformatics forum though bioperl mailing list is  
> already a
> good one? If we want to establish one, we need volunteers to work  
> as forum
> moderators. I am willing to sponsor h/w, software license, and  
> hosting if
> there is enough interest.
> Andy
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From lstein at cshl.edu  Mon Aug 29 15:20:50 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 29 15:12:45 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <430F76B7.7010803@wam.umd.edu>
References: <430E3350.50604@wam.umd.edu>
	<8c31cb4c665092173967567f87b35a34@gmx.net>
	<430F76B7.7010803@wam.umd.edu>
Message-ID: <200508291520.50822.lstein@cshl.edu>

Hi,

I have an environment variable in my .cshrc as follows:

	setenv PERL5LIB $HOME/projects/bioperl-live

Lincoln

On Friday 26 August 2005 04:08 pm, Andrew Stewart wrote:
> Do many of you use bioperl-live as your primary (or exclusive) BioPerl
> distribution, or do you keep a stable version as well?
>
> I'd like to use bioperl-live for instances such as this (see message
> history), but not necessarily for when I'm developing scripts that are
> going to be used by others in my lab who do not necessarily have
> bioperl-live installed.
>
> What I'm thinking is that I should maybe install a copy of bioperl-live
> somewhere in my personal space, and then 'use' it in certain scripts
> when needed.  I just have a few questions (these are probably more
> 'perl' questions than 'bio-perl' questions)...
>
> 1. Once I obtain bioperl-live via 'cvs -d :pserver etc...', do I need to
> actually go through the install routine or can I just access the modules
> from where they are downloaded?
>
> and
>
> 2. Would I then place code at the header of my script such as...
>
> use lib "/path/to/bioperl-live";
> use MODULE;
>
> and the updated module will (temporarily) override the other bioperl
> modules in my @INC?
>
> I tried this, actually, without any noticable change in my previous problem
> (_print_GenBank... still prints the feature tabs as
> /tab="Bio::Annotation::SimpleValue=HASH(0x87f0914)" instead of
> /tab="value"), but I don't know for certain if perl was using the
> modules from my bioperl-live installation or the older ones.
>
>
> -Andrew Stewart
>
> Hilmar Lapp wrote:
> >> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote:
> >>> Would it be possible to simply update the module which contains the
> >>> error (or are there multiple files?) rather than downgrade to 1.4 or
> >>> upgrade to the HEAD branch?
> >>> -Andrew
> >
> > You could, e.g. using Jason's suggestion, but I don't know why you
> > wouldn't just want to upgrade to the main trunk. Currently, this is as
> > close as you can get to upgrading to 1.5.1., which is what you will
> > want to do anyway immediately once it's out.
> >
> >     -hilmar
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From hlapp at gmx.net  Mon Aug 29 15:48:30 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Aug 29 15:37:46 2005
Subject: [Bioperl-l] ->add_tag_value()
In-Reply-To: <430F76B7.7010803@wam.umd.edu>
References: <430E3350.50604@wam.umd.edu>
	<A362C63D-80BB-4EAE-AD09-7D5BAD2464E7@duke.edu>
	<430F1E81.6080400@wam.umd.edu>
	<44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu>
	<430F374B.7060408@wam.umd.edu>
	<6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu>
	<8c31cb4c665092173967567f87b35a34@gmx.net>
	<430F76B7.7010803@wam.umd.edu>
Message-ID: <9185c308932848f92b4a2f905222b1cb@gmx.net>


On Aug 26, 2005, at 1:08 PM, Andrew Stewart wrote:

> Do many of you use bioperl-live as your primary (or exclusive) BioPerl 
> distribution, or do you keep a stable version as well?
>

I used to live off of bioperl-live and update regularly but I stopped 
updating about a year ago, so technically for production I'm on 
something close to 1.4.1.

I do keep a cvs head version too, for developing / fixing.

> I'd like to use bioperl-live for instances such as this (see message 
> history), but not necessarily for when I'm developing scripts that are 
> going to be used by others in my lab who do not necessarily have 
> bioperl-live installed.
>
> What I'm thinking is that I should maybe install a copy of 
> bioperl-live somewhere in my personal space, and then 'use' it in 
> certain scripts when needed.

What I do is I don't install Bioperl in order to avoid any precedence 
order mistakes for library paths. Since there is no compiled code 
(unless you also use bioperl-ext) you can just point PERL5LIB at the 
root of the Bioperl installation you want to work with, and if there is 
none in the standard @INC you can be sure which modules will be loaded.

>  I just have a few questions (these are probably more 'perl' questions 
> than 'bio-perl' questions)...
>
> 1. Once I obtain bioperl-live via 'cvs -d :pserver etc...', do I need 
> to actually go through the install routine or can I just access the 
> modules from where they are downloaded?

No and yes, respectively. See above.

>
> and
>
> 2. Would I then place code at the header of my script such as...
>
> use lib "/path/to/bioperl-live";
> use MODULE;
>
> and the updated module will (temporarily) override the other bioperl 
> modules in my @INC?
>
> I tried this, actually, without any noticable change in my previous 
> problem
> (_print_GenBank... still prints the feature tabs as 
> /tab="Bio::Annotation::SimpleValue=HASH(0x87f0914)" instead of 
> /tab="value"), but I don't know for certain if perl was using the 
> modules from my bioperl-live installation or the older ones.
>

I'm not sure about the search order. The POD for lib says:

        It is typically used to add extra directories to perl's search 
path so
        that later "use" or "require" statements will find modules which 
are
        not located on perl's default search path.

but also:

        The parameters to "use lib" are added to the start of the perl 
search
        path. Saying

            use lib LIST;

        is almost the same as saying

            BEGIN { unshift(@INC, LIST) }

Note you can easily test which version is loaded by using either the 
debugger or add some garbage to the module.

	-hilmar


>
> -Andrew Stewart
>
>
> Hilmar Lapp wrote:
>
>>
>>> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote:
>>>
>>>> Would it be possible to simply update the module which contains the 
>>>> error (or are there multiple files?) rather than downgrade to 1.4 
>>>> or upgrade to the HEAD branch?
>>>> -Andrew
>>>
>>
>> You could, e.g. using Jason's suggestion, but I don't know why you 
>> wouldn't just want to upgrade to the main trunk. Currently, this is 
>> as close as you can get to upgrading to 1.5.1., which is what you 
>> will want to do anyway immediately once it's out.
>>
>>     -hilmar
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From lstein at cshl.edu  Mon Aug 29 17:43:30 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Aug 29 17:33:34 2005
Subject: [Bioperl-l] Re: Bio::DB::GFF start/end coordinates
In-Reply-To: <D479BA98-53E4-4024-A53F-441A09F65F89@duke.edu>
References: <D479BA98-53E4-4024-A53F-441A09F65F89@duke.edu>
Message-ID: <200508291743.30620.lstein@cshl.edu>

Hi Jason,

You've got to set $db->absolute(1) to get true Bioperl-compliant coordinates. 
The reason for this is because of Bio::DB::GFF's (perhaps regrettable) use of 
relative coordinate addressing by default. This is explicitly mentioned in 
the documentation under a section named (something like) "BioPerl 
compliance."

Lincoln

On Thursday 25 August 2005 01:00 pm, Jason Stajich wrote:
> Lincoln -
>
> One bug I'm still seeing in Bio::DB::GFF::Feature objects is start/
> end are still returning start > end when strand < 0.   I know this is
> different expectation for Bioperl / Gbrowse but this causes a little
> problems, especially when you get an aggregated feature out from
> Bio::DB:GFF and then write it to a genbank file.  The locations looks
> like this:
> complement(join(1031..975,676..501))
>
> My workaround is just to create new Location objects and features
> from the Bio::DB::GFF obtained objects  (some of these aren't
> allowing write-back to overwrite the values).
>
> Note on a slightly separate topic:
>   I have patched my Bio::Location::Split to_FTstring to simplify the
> string, current behavior would be to output the location like this:
> join(complement(1031..975),complement(676..501),))
>
> I'm seeing about how applying the patch, I'm not sure whether or not
> it perfectly works.
>
>
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From bmoore at genetics.utah.edu  Mon Aug 29 18:13:56 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Mon Aug 29 18:00:54 2005
Subject: [Bioperl-l] If you use RemoteBlast
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863F5D@m.hg.genetics.utah.edu>

Jason-

I stepped through the code in
Bio::Tools::Run::RemoteBlast::submit_blast, and bioperl is using the
default new formatter, and for the dozen or so nucleotide sequences that
I ran no problems parsing.

Barry 

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason
Stajich
Sent: Monday, August 29, 2005 10:18 AM
To: Bioperl List
Subject: [Bioperl-l] If you use RemoteBlast

So those of you who use the Tools::RemoteBlast module, please read  
the following email.    We need some people to test out how much the  
parser breaks with the "new" formatter. Tests seem to pass right now,  
but I don't know if that is because the 'old' format is being  
requested still.  Could someone please take a little time to see  
what's going on and report back.

Thanks,
-jason

Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI)" <mcginnis@ncbi.nlm.nih.gov>
> Date: August 29, 2005 12:06:52 PM EDT
> To: "'jason@bioperl.org'" <jason@bioperl.org>
> Subject: New BLAST Formatter.
>
>
> Hello.
>
> The new BLAST formatter has been a default for a months now. But  
> we'd like
> to shut off the old one.
>
> Will this pose a problem?
>
> Thanks,
>
> Sincerely,
> Scott D. McGinnis, M.S.
> National Center for Biotechnology Information
> <http://www.ncbi.nlm.nih.gov>
>
>
> Blast-announce: New BLAST formatter at the NCBI
>
> A new version of the BLAST formatter has been the default on the  
> NCBI BLAST
> web pages for the past XX months.  On September 6, 2005 we will  
> remove the
> checkbox allowing users to select the old formatter and support for  
> the old
> formatter will be discontinued.
>
> This formatter has been rewritten from scratch using the NCBI C++  
> toolkit
> and includes many new features (see list below) as well as the  
> ability to
> fetch parts of genomic sequences when needed, making it much faster  
> than the
> old formatter for many queries.
>
> Please send questions or comments to blast-help@ncbi.nlm.nih.gov
>
>
> New features:
> --------------
>
> 1.) The new formatter will present the masked residues or bases as
> lower-case letters.  Additionally the masked letters can be shown  
> in color.
> To use this feature change the "Masking Character" to "Lower case"  
> on the
> formatting page and select a "Masking Color".
> Example:
>
> http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi? 
> CMD=Get&RID=1098448824-15725-370
> 06897750.BLASTQ4&NEW_FORMATTER=on&MASK_CHAR=2&MASK_COLOR=2&DESCRIPTION

> S=0
>
>
> 2.) The "pairwise with identities" option allows easy  
> identification of a
> few mismatches among highly similar sequences. In this (pair-wise)  
> view
> mismatches, as well as "Sbjct" (on the line containing the  
> mismatch) are
> shown in red.
> To use this feature change the "Alignment view" to "Pairwise with
> identities" on the formatting page.
> Example:
>
> http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
> CMD=Get&NCBI_GI=yes&SHOW_OVERVIE
> W=on&ALIGNMENT_VIEW=PairwiseWithIdentities&NEW_FORMATTER=on&RID=111089

> 2196-1
> 6209-7903412953.BLASTQ4#28302128
>
>
> 3.) For database sequences longer than 200,000 bases each alignment  
> has a
> header entitled "Features in this part of the subject sequence"
> listing CDS features on the database sequence within the alignment  
> range or
> at the 5' or 3' end if not features are within the range itself.
> This gives a quick description of what you are looking at as many long
> sequences have a standard defline such as "chromosome 16".
> Example:
>
> http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
> CMD=Get&NCBI_GI=yes&SHOW_OVERVIE
> W=on&NEW_FORMATTER=on&RID=1098455471-18771-167762343145.BLASTQ4#514656

> 96
>
>
> Rewrites/bug fixes:
> -------------------
>
> 1.) The graphic overview has been rewritten; it now uses an HTML
> implementation.
>
> 2.) Query-anchored views now work with blastx/tblastn/tblastx, they  
> didn't
> before.
>
> 3.) phi-BLAST patterns are now also shown in the query-anchored view.
>


--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From limericksean at gmail.com  Tue Aug 30 04:55:27 2005
From: limericksean at gmail.com (Sean O'Keeffe)
Date: Tue Aug 30 04:44:59 2005
Subject: [Bioperl-l] Bio::Search results
Message-ID: <4627846405083001556a5d030e@mail.gmail.com>

Hi,
The following code snippet is something I use to extract information
from hmmer result files:

use Bio::SearchIO;

my $in =3D new Bio::SearchIO( -format =3D> 'hmmer',  -file =3D> $ARGV[0] );
while(my $result =3D $in->next_result) {
  print $result->query_name(), "\n",$result->query_description(),"\n";
  while (my $hit =3D $result->next_hit) {
    while(my $hsp =3D $hit->next_domain) {
      next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
      print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
    }
  }
}

The input file is generated by hmmpfam and is given at the command
line. I use it to scan for specific domain names e.g ig, fn3 lrr etc.
This code works for the first loop and then ends so I get the name and
description (no hsp values as their are none for this result):

ENSMUSP00000065602=20
pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059
transcript:ENSMUST00000066517

My question is why does the loop end after one instance. Incidentally
the outputted  name and description above are the last ones in the
hmmer file (maybe the file is read from the back??? - don't know if this
means anything).
Any thoughts would be appreciated. Thanks,
Sean.

From bmoore at genetics.utah.edu  Tue Aug 30 10:40:52 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Tue Aug 30 10:28:02 2005
Subject: [Bioperl-l] Bio::Search results
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863F60@m.hg.genetics.utah.edu>

Sean,

Don't see anything obviously wrong.  If you want to send your input
file, I'll try to recreate the problem.

Barry

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sean
O'Keeffe
Sent: Tuesday, August 30, 2005 2:55 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Bio::Search results

Hi,
The following code snippet is something I use to extract information
from hmmer result files:

use Bio::SearchIO;

my $in =3D new Bio::SearchIO( -format =3D> 'hmmer',  -file =3D> $ARGV[0]
);
while(my $result =3D $in->next_result) {
  print $result->query_name(), "\n",$result->query_description(),"\n";
  while (my $hit =3D $result->next_hit) {
    while(my $hsp =3D $hit->next_domain) {
      next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
      print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
    }
  }
}

The input file is generated by hmmpfam and is given at the command
line. I use it to scan for specific domain names e.g ig, fn3 lrr etc.
This code works for the first loop and then ends so I get the name and
description (no hsp values as their are none for this result):

ENSMUSP00000065602=20
pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059
transcript:ENSMUST00000066517

My question is why does the loop end after one instance. Incidentally
the outputted  name and description above are the last ones in the
hmmer file (maybe the file is read from the back??? - don't know if this
means anything).
Any thoughts would be appreciated. Thanks,
Sean.

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From cjfields at uiuc.edu  Tue Aug 30 10:40:48 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue Aug 30 10:30:03 2005
Subject: [Bioperl-l] SO for RNA-Binding Protein and RNA motifs
Message-ID: <6.2.1.2.2.20050830093049.01ecb4a8@express.cites.uiuc.edu>

Just had a few simple questions about sequence ontology.  What ontology 
terms are being used for RNA-binding proteins (like IRE or TRAP) or 
conserved regulatory RNA motifs such as riboswitches?  I was thinking about 
using TF_binding_site for the former, but is this term mainly for 
DNA-binding proteins?  I found a few terms for conserved elements in SO and 
SOFA (like attenuators), but other conserved motifs (IRE, so on) seem to be 
missing.

I am scanning bacterial genomes for conserved RNA motifs for a few 
different RNA binding proteins and riboswitches and I would like to convert 
this data over to GFF3 to map the positions of the hits to the genomes in 
question.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign

From limericksean at gmail.com  Tue Aug 30 11:28:11 2005
From: limericksean at gmail.com (Sean O'Keeffe)
Date: Tue Aug 30 11:19:57 2005
Subject: [Bioperl-l] Bio::Search results
In-Reply-To: <CFE1DF3BA20F424689DA0881A14055BE863F60@m.hg.genetics.utah.edu>
References: <CFE1DF3BA20F424689DA0881A14055BE863F60@m.hg.genetics.utah.edu>
Message-ID: <46278464050830082816e3df97@mail.gmail.com>

Hi Barry, thanks for the reply. Below is a snippet of the file (I
generated it with hmmpfam using the alignment flag set to -A 0, to
remove alignments - this shouldn't affect the parsing of the file) :

hmmpfam - search one or more sequences against HMM database
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /usr/local/lib/pfam-tm
Sequence file:            Mus_musculus.NCBIM34.jul.pep.fa-short
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query sequence: ENSMUSP00000089702
Accession:      [none]
Description:    [none]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]
//

Query sequence: ENSMUSP00000089701
Accession:      [none]
Description:    [none]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]
//

Query sequence: ENSMUSP00000020094
Accession:      [none]
Description:    [none]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
LRR_1    Leucine Rich Repeat                             55.0    3.7e-15   6
LRRNT    Leucine rich repeat N-terminal domain           30.1    1.1e-07   1

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
LRRNT      1/1     117   142 ..     1    34 []    30.1  1.1e-07
LRR_1      1/6     168   191 ..     1    25 []    14.7   0.0049
LRR_1      2/6     192   210 ..     1    25 []     9.1      0.2
LRR_1      3/6     212   237 ..     1    25 []     8.4     0.26
LRR_1      4/6     238   257 ..     1    25 []    10.3      0.1
LRR_1      5/6     259   282 ..     1    25 []    10.1     0.12
LRR_1      6/6     290   314 ..     1    25 []     2.4        2
//

Query sequence: ENSMUSP00000074175
Accession:      [none]
Description:    [none]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
CUB      CUB domain                                     232.5    1.3e-68   2
Trypsin  Trypsin                                        206.0    1.2e-60   1
Sushi    Sushi domain (SCR repeat)                       88.7    2.6e-25   2
EGF_CA   Calcium binding EGF domain                      29.4    1.8e-07   1

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
CUB        1/2      16   137 ..     1   116 []    70.0  1.1e-19
EGF_CA     1/1     141   188 ..     1    55 []    29.4  1.8e-07
CUB        2/2     192   301 ..     1   116 []   162.5  1.5e-47
Sushi      1/2     308   370 ..     1    62 []    47.5  6.5e-13
Sushi      2/2     375   446 ..     1    62 []    41.2  5.1e-11
Trypsin    1/1     463   698 ..     1   259 []   206.0  1.2e-60
//


Cheers,
Sean.


On 8/30/05, Barry Moore <bmoore@genetics.utah.edu> wrote:
> Sean,
> 
> Don't see anything obviously wrong.  If you want to send your input
> file, I'll try to recreate the problem.
> 
> Barry
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sean
> O'Keeffe
> Sent: Tuesday, August 30, 2005 2:55 AM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] Bio::Search results
> 
> Hi,
> The following code snippet is something I use to extract information
> from hmmer result files:
> 
> use Bio::SearchIO;
> 
> my $in =3D new Bio::SearchIO( -format =3D> 'hmmer',  -file =3D> $ARGV[0]
> );
> while(my $result =3D $in->next_result) {
>   print $result->query_name(), "\n",$result->query_description(),"\n";
>   while (my $hit =3D $result->next_hit) {
>     while(my $hsp =3D $hit->next_domain) {
>       next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
>       print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
>     }
>   }
> }
> 
> The input file is generated by hmmpfam and is given at the command
> line. I use it to scan for specific domain names e.g ig, fn3 lrr etc.
> This code works for the first loop and then ends so I get the name and
> description (no hsp values as their are none for this result):
> 
> ENSMUSP00000065602=20
> pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059
> transcript:ENSMUST00000066517
> 
> My question is why does the loop end after one instance. Incidentally
> the outputted  name and description above are the last ones in the
> hmmer file (maybe the file is read from the back??? - don't know if this
> means anything).
> Any thoughts would be appreciated. Thanks,
> Sean.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From golharam at umdnj.edu  Wed Aug 31 00:32:13 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed Aug 31 01:18:47 2005
Subject: [Bioperl-l] Make test fails
Message-ID: <00d601c5ade4$f5e37340$d33d140a@GOLHARMOBILE1>

I just updated my copy of bioperl live from cvs and when I do a 'make
test', it fails miserably.  Here's the relevant output:

t/DB.........................FAILED test 24
        Failed 1/84 tests, 98.81% okay
t/DBCUTG.....................ok
        22/24 skipped: tests which require remote servers - set env
variable BIO
PERLDEBUG to test
t/DBFasta....................ok
t/DNAMutation................ok
t/Domcut.....................ok
        22/25 skipped: tests which require remote servers - set env
variable BIO
PERLDEBUG to test
t/ECnumber...................ok
t/ELM........................
-------------------- WARNING ---------------------
MSG: Bio::Tools::Analysis::Protein::ELM Request Error:
400 (Bad Request) URL must be absolute
Client-Date: Wed, 31 Aug 2005 04:23:10 GMT


---------------------------------------------------
ok
t/embl.......................
-------------------- WARNING ---------------------
MSG: Bio::PrimarySeq=HASH(0x9f74ff8) is not a SeqI compliant sequence
object!
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: test is not a SeqI compliant sequence object!
---------------------------------------------------
ok
t/EMBL_DB....................ok
t/ESEfinder..................error is 0
ok
        10/12 skipped: tests which require remote servers - set env
variable BIO
PERLDEBUG to test
t/FeatureIO..................
-------------------- WARNING ---------------------
MSG: '##feature-ontology' directive handling not yet implemented
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: '##attribute-ontology' directive handling not yet implemented
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: '##source-ontology' directive handling not yet implemented
---------------------------------------------------
ok
t/Index......................
-------------------- WARNING ---------------------
MSG: overwriting a current value stored for AJ288898

---------------------------------------------------

-------------------- WARNING ---------------------
MSG: overwriting a current value stored for AI129902

---------------------------------------------------

-------------------- WARNING ---------------------
MSG: overwriting a current value stored for BAB68554

---------------------------------------------------
ok
t/protgraph..................doing subgraphs
|||||||in subgraph - size30
in subgraph - size33
in subgraph - size3
in subgraph - size3
in subgraph - size5
Can't call method "object_id" on unblessed reference at t/protgraph.t
line 81, <
GEN1> line 82.
dubious
        Test returned status 25 (wstat 6400, 0x1900)
        after all the subtests completed successfully
t/Spidey.....................Global symbol "$exon_num" requires explicit
package
 name at /tmp/bioperl-live/blib/lib/Bio/Tools/Spidey/Results.pm line
309.
Global symbol "$gen_start" requires explicit package name at
/tmp/bioperl-live/b
lib/lib/Bio/Tools/Spidey/Results.pm line 309.
Global symbol "$gen_stop" requires explicit package name at
/tmp/bioperl-live/bl
ib/lib/Bio/Tools/Spidey/Results.pm line 309.
Global symbol "$cdna_start" requires explicit package name at
/tmp/bioperl-live/
blib/lib/Bio/Tools/Spidey/Results.pm line 309.
Global symbol "$cdna_stop" requires explicit package name at
/tmp/bioperl-live/b
lib/lib/Bio/Tools/Spidey/Results.pm line 309.


Any idea why I'm getting these errors?  Should I blow away my
bioperl-live directory and checkout a whole new version?

My spidey modules shouldn't be failing...my last update to it was
working fine...

Ryan

From lupey+ at pitt.edu  Tue Aug 30 20:34:39 2005
From: lupey+ at pitt.edu (Paul G Cantalupo)
Date: Wed Aug 31 08:20:46 2005
Subject: [Bioperl-l] get_sequence - acc does not exist
Message-ID: <Pine.SOC.4.63.0508302018160.28152@unixs1.cis.pitt.edu>

Hello,

I discovered that Bio::Perl get_sequence does not handle Genbank GI 
numbers properly due to the following code in get_sequence:

    if( $identifier =~ /^\w+\d+$/ ) {
        $seq = $db->get_Seq_by_acc($identifier);
    } else {
        $seq = $db->get_Seq_by_id($identifier);
    }

Genbank GI numbers (i.e. 51527264) match the regular expression /^\w+\d+$/ 
therefore unsuprisingly the method get_Seq_by_acc fails (with a warning 
like: MSG: acc (gb|51527264) does not exist). Instead, the method 
get_Seq_by_id works when called with GI numbers:


   use Bio::DB::GenBank;
   my $genbank_db = Bio::DB::GenBank->new();
   $seq = $genbank_db->get_Seq_by_id(51527264);
   print $seq->desc;

Shouldn't the regular expression in get_sequence be changed to look for 
identifiers that are all digits and then call get_Seq_by_id? Or am I not 
understanding something?

Thank you,

Paul

Paul Cantalupo
Research Specialist/Systems Programmer
559 Crawford Hall
Department of Biological Sciences
University of Pittsburgh
Pittsburgh, PA 15260
Work: 412-624-4687
Fax: 412-624-4759

Ask me about Toastmasters: www.toastmasters.org
Midday Club Treasurer
From rvosa at sfu.ca  Wed Aug 31 08:10:36 2005
From: rvosa at sfu.ca (Rutger Vos)
Date: Wed Aug 31 08:40:29 2005
Subject: [Bioperl-l] interoperability with bioperl
Message-ID: <43159E3C.9080907@sfu.ca>

Dear BioPerlers,

I am the author of a phylogenetics oriented package on CPAN called 
Bio::Phylo (link in my sig). The tree object is superficially similar to 
something that implements Bio::Tree::TreeI, and so I'm looking for a way 
to implement interoperability - at least for that object - with BioPerl. 
Bio::Phylo is more aimed at phylogeneticists, who might not be as 
interested in installing the BioPerl core (and sort out the dependencies 
and so on), so I am not looking to integrate in BioPerl.

I am now leaning towards implementing interoperability in the following way:

i) I create a separate CPAN package, with BioPerl & Bio::Phylo 
dependencies (so that my own "core" doesn't require bioperl).

ii) this package inherits (using "use base" & "use fields" from 
Bio::Phylo::Trees::Tree, see 
http://search.cpan.org/~rvosa/Bio-Phylo-0.04/lib/Bio/Phylo/Trees/Tree.pm).

iii) this package @ISA Bio::Tree::TreeI (see 
http://search.cpan.org/~birney/bioperl-1.4/Bio/Tree/TreeI.pm).

iv) hence, through multiple inheritance, a "hybrid" tree object is 
created, which implements both APIs (there's a fair amount of overlap, I 
might get away with just symbol table manipulation (globs) to implement 
Bio::Tree::TreeI).

I solicit your thoughts on whether you think this is the right way to go 
about things. My main worry is that there'll be problems if people have 
taken to sticking their fingers inside Bio::Tree::TreeI-like objects to 
fondle their attributes directly. Then again, that "voids their 
warranty", perhaps.

Best wishes,

Rutger

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From birney at ebi.ac.uk  Wed Aug 31 08:50:59 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Wed Aug 31 08:48:35 2005
Subject: [Bioperl-l] get_sequence - acc does not exist
In-Reply-To: <Pine.SOC.4.63.0508302018160.28152@unixs1.cis.pitt.edu>
References: <Pine.SOC.4.63.0508302018160.28152@unixs1.cis.pitt.edu>
Message-ID: <4315A7B3.5080504@ebi.ac.uk>


Paul G Cantalupo wrote:
> Hello,
> 
> I discovered that Bio::Perl get_sequence does not handle Genbank GI 
> numbers properly due to the following code in get_sequence:
> 
>    if( $identifier =~ /^\w+\d+$/ ) {
>        $seq = $db->get_Seq_by_acc($identifier);
>    } else {
>        $seq = $db->get_Seq_by_id($identifier);
>    }
> 
> Genbank GI numbers (i.e. 51527264) match the regular expression 
> /^\w+\d+$/ therefore unsuprisingly the method get_Seq_by_acc fails (with 
> a warning like: MSG: acc (gb|51527264) does not exist). Instead, the 
> method get_Seq_by_id works when called with GI numbers:
> 
> 
>   use Bio::DB::GenBank;
>   my $genbank_db = Bio::DB::GenBank->new();
>   $seq = $genbank_db->get_Seq_by_id(51527264);
>   print $seq->desc;
> 
> Shouldn't the regular expression in get_sequence be changed to look for 
> identifiers that are all digits and then call get_Seq_by_id? Or am I not 
> understanding something?
> 

traditionally "GI" numbers are _not_ accession numbers: GI numbers
are internal numbers given out by NCBI for sequences in-house. However, this
is all about heuristics guessing the right thing, and probably the right thing
to do is try the get_Seq_by_acc, and then if this is undef, try get_Seq_by_id


> Thank you,
> 
> Paul
> 
> Paul Cantalupo
> Research Specialist/Systems Programmer
> 559 Crawford Hall
> Department of Biological Sciences
> University of Pittsburgh
> Pittsburgh, PA 15260
> Work: 412-624-4687
> Fax: 412-624-4759
> 
> Ask me about Toastmasters: www.toastmasters.org
> Midday Club Treasurer
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From jason.stajich at duke.edu  Wed Aug 31 08:24:45 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug 31 09:17:17 2005
Subject: [Bioperl-l] Make test fails
In-Reply-To: <00d601c5ade4$f5e37340$d33d140a@GOLHARMOBILE1>
References: <00d601c5ade4$f5e37340$d33d140a@GOLHARMOBILE1>
Message-ID: <4A14F04A-DCA9-40CD-BEDE-183A008291B4@duke.edu>

I don't know that this is miserable, how many passed? ;)

I think you might want to get a fresh copy of spidey.pm and re-try  
that test, it works fine on my machine, maybe you have local changes  
that got merged during the CVS update - if there are ">>>" lines in  
your spidey.pm code it could be causing problems.

I see the protgraph failure too depending on which version of  
Graph::Directed I have installed.

The t/embl.t warning is intended, although we may want to silence it  
unless BIOPERLDEBUG environment variable is set.

The rest of the failures are wrt remote websites which we don't have  
control over, so things seem to drift.  I see the t/ELM.t failure  
too, need to see if Richard or someone can take a looksie. The DB.t  
tests are all failing, I don't know what is the problem with the  
website, but I think we'll definitely disable them without  
BIOPERLDEBUG set.

-jason

On Aug 31, 2005, at 12:32 AM, Ryan Golhar wrote:

> I just updated my copy of bioperl live from cvs and when I do a 'make
> test', it fails miserably.  Here's the relevant output:
>
> t/DB.........................FAILED test 24
>         Failed 1/84 tests, 98.81% okay
> t/DBCUTG.....................ok
>         22/24 skipped: tests which require remote servers - set env
> variable BIO
> PERLDEBUG to test
> t/DBFasta....................ok
> t/DNAMutation................ok
> t/Domcut.....................ok
>         22/25 skipped: tests which require remote servers - set env
> variable BIO
> PERLDEBUG to test
> t/ECnumber...................ok
> t/ELM........................
> -------------------- WARNING ---------------------
> MSG: Bio::Tools::Analysis::Protein::ELM Request Error:
> 400 (Bad Request) URL must be absolute
> Client-Date: Wed, 31 Aug 2005 04:23:10 GMT
>
>
>
> ---------------------------------------------------
> ok
> t/embl.......................
> -------------------- WARNING ---------------------
> MSG: Bio::PrimarySeq=HASH(0x9f74ff8) is not a SeqI compliant sequence
> object!
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: test is not a SeqI compliant sequence object!
> ---------------------------------------------------
> ok
> t/EMBL_DB....................ok
> t/ESEfinder..................error is 0
> ok
>         10/12 skipped: tests which require remote servers - set env
> variable BIO
> PERLDEBUG to test
> t/FeatureIO..................
> -------------------- WARNING ---------------------
> MSG: '##feature-ontology' directive handling not yet implemented
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: '##attribute-ontology' directive handling not yet implemented
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: '##source-ontology' directive handling not yet implemented
> ---------------------------------------------------
> ok
> t/Index......................
> -------------------- WARNING ---------------------
> MSG: overwriting a current value stored for AJ288898
>
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: overwriting a current value stored for AI129902
>
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: overwriting a current value stored for BAB68554
>
> ---------------------------------------------------
> ok
> t/protgraph..................doing subgraphs
> |||||||in subgraph - size30
> in subgraph - size33
> in subgraph - size3
> in subgraph - size3
> in subgraph - size5
> Can't call method "object_id" on unblessed reference at t/protgraph.t
> line 81, <
> GEN1> line 82.
> dubious
>         Test returned status 25 (wstat 6400, 0x1900)
>         after all the subtests completed successfully
> t/Spidey.....................Global symbol "$exon_num" requires  
> explicit
> package
>  name at /tmp/bioperl-live/blib/lib/Bio/Tools/Spidey/Results.pm line
> 309.
> Global symbol "$gen_start" requires explicit package name at
> /tmp/bioperl-live/b
> lib/lib/Bio/Tools/Spidey/Results.pm line 309.
> Global symbol "$gen_stop" requires explicit package name at
> /tmp/bioperl-live/bl
> ib/lib/Bio/Tools/Spidey/Results.pm line 309.
> Global symbol "$cdna_start" requires explicit package name at
> /tmp/bioperl-live/
> blib/lib/Bio/Tools/Spidey/Results.pm line 309.
> Global symbol "$cdna_stop" requires explicit package name at
> /tmp/bioperl-live/b
> lib/lib/Bio/Tools/Spidey/Results.pm line 309.
>
>
> Any idea why I'm getting these errors?  Should I blow away my
> bioperl-live directory and checkout a whole new version?
>
> My spidey modules shouldn't be failing...my last update to it was
> working fine...
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From birney at ebi.ac.uk  Wed Aug 31 09:13:46 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Wed Aug 31 09:18:15 2005
Subject: [Bioperl-l] interoperability with bioperl
In-Reply-To: <43159E3C.9080907@sfu.ca>
References: <43159E3C.9080907@sfu.ca>
Message-ID: <4315AD0A.3060700@ebi.ac.uk>


Rutger Vos wrote:
> Dear BioPerlers,
> 
> I am the author of a phylogenetics oriented package on CPAN called 
> Bio::Phylo (link in my sig). The tree object is superficially similar to 
> something that implements Bio::Tree::TreeI, and so I'm looking for a way 
> to implement interoperability - at least for that object - with BioPerl. 
> Bio::Phylo is more aimed at phylogeneticists, who might not be as 
> interested in installing the BioPerl core (and sort out the dependencies 
> and so on), so I am not looking to integrate in BioPerl.
> 
> I am now leaning towards implementing interoperability in the following 
> way:
> 
> i) I create a separate CPAN package, with BioPerl & Bio::Phylo 
> dependencies (so that my own "core" doesn't require bioperl).
> 
> ii) this package inherits (using "use base" & "use fields" from 
> Bio::Phylo::Trees::Tree, see 
> http://search.cpan.org/~rvosa/Bio-Phylo-0.04/lib/Bio/Phylo/Trees/Tree.pm).
> 
> iii) this package @ISA Bio::Tree::TreeI (see 
> http://search.cpan.org/~birney/bioperl-1.4/Bio/Tree/TreeI.pm).
> 
> iv) hence, through multiple inheritance, a "hybrid" tree object is 
> created, which implements both APIs (there's a fair amount of overlap, I 
> might get away with just symbol table manipulation (globs) to implement 
> Bio::Tree::TreeI).
> 
> I solicit your thoughts on whether you think this is the right way to go 
> about things. My main worry is that there'll be problems if people have 
> taken to sticking their fingers inside Bio::Tree::TreeI-like objects to 
> fondle their attributes directly. Then again, that "voids their 
> warranty", perhaps.
> 

This is precisely the way to go and I am planning on a similar approach
to "bridge" between Ensembl and Bioperl - ie, make wrapper classes that
holds onto the Ensembl object, and delegates the necessary I defined functions
for Bioperl "Clients".


As you said, if I client starts looking inside the object directly then
its on its own head.


> Best wishes,
> 
> Rutger
> 
From slenk at emich.edu  Wed Aug 31 10:12:24 2005
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed Aug 31 11:02:34 2005
Subject: [Bioperl-l] Protein alignment CD excision module
Message-ID: <1a085e81a039a6.1a039a61a085e8@emich.edu>

I am converting a module that takes a ClustalW alignment, data mines 
the conserved domains from NCBI, then selectively replaces the CDs 
with IUPAC 'X' and writes a ClustalW file back out. We have several 
uses for this module's functions.

I am converting this to be a Bioperl module to take advantage of 
AlignIO capabilities to read/write multiple alignment file types.

There is a .pm package excise_cd.pm, which I have placed in Align 
(along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not 
yet written an I file for it, but recognise the necessity of doing so 
for optimum compatability with Bioperl.

Only one method from excise_cd is used outside the module - excise(), 
which takes a SimpleAlign object made with AlignIO in the calling 
program and a hash function with options. The excise method extracts 
the sequence data from the SimpleAlign object, data mines the CD 
information and uses the options to guide the overwriting of residues 
with 'X'. excise() (will) then create an AlignIO output object of the 
requested format with the excised alignment. This is then returned to 
the caller, which can write out the excised alignment in the desired 
format.

I think of this from an external perspective as a CD excising (Xing 
out) and data converting filter for alignment files. 

Is this a reasonable approach? Would this be an appropriate module and 
script for me to donate to Bioperl when properly done?

Another question - I data mine from NCBI using only gi identifiers for 
the proteins. I have writen my own code to do this. Is there a Bioperl 
way to do get CD data for a protein and can this way allow me to 
obtain CD regions for PFAM or other identifiers as well?

Thanks,
Steve Lenk
slenk@emich.edu
From heikki at ebi.ac.uk  Wed Aug 31 12:36:14 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Wed Aug 31 12:33:30 2005
Subject: [Bioperl-l] Protein alignment CD excision module
In-Reply-To: <1a085e81a039a6.1a039a61a085e8@emich.edu>
References: <1a085e81a039a6.1a039a61a085e8@emich.edu>
Message-ID: <200508311736.15001.heikki@ebi.ac.uk>

Steve,

I can see the usefulness of what you are doing, but bioperl is a library and 
needs to think modularly so that other users can easily modify it. What you 
are describing is a best implemented as a script that uses several modules.
That example script could be stored in BioPerl separately.


On Wednesday 31 August 2005 15:12, Stephen Gordon Lenk wrote:
> I am converting a module that takes a ClustalW alignment, data mines
> the conserved domains from NCBI, then selectively replaces the CDs
> with IUPAC 'X' and writes a ClustalW file back out. We have several
> uses for this module's functions.

Reading and writing an alignment is already handled by Bio::AlignIO. If you 
hardcode the format in a module, you loose flexibility. So this belongs to a 
script.

"data mines the conserved domains from NCBI"

This needs to be done separately by writing, e.g., a Bio::DB or a 
Bio::Tools::Analysis module for accessing the data. Then you need a storage 
object to store the conserved residues. You could use Bio::Seq::Meta derived 
objects to do that or store them as sequence feaures Bio::SeqFeature::Generic 
- or roll your own. The main question is that do you need to store 
residue-based information or a few large regions.

"then selectively replaces the CDs with IUPAC 'X'"

This could be implemented as a method that takes the alignment and the storage 
object(s) from your analysis and returns the new alignment. 
Bio::Align::Utilities could store that.

> I am converting this to be a Bioperl module to take advantage of
> AlignIO capabilities to read/write multiple alignment file types.

Good idea.

> There is a .pm package excise_cd.pm, which I have placed in Align
> (along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not

clustalw.pm is in Bio::AlignIO.  Only modules that are subclasses of 
Bio::AlignIO should go there.

> yet written an I file for it, but recognise the necessity of doing so
> for optimum compatability with Bioperl.

An I file is needed only if you expect that there will be several 
implementations of the interface.

> Only one method from excise_cd is used outside the module - excise(),
> which takes a SimpleAlign object made with AlignIO in the calling
> program and a hash function with options. The excise method extracts

For modularity, that hash storing all the options, need to turned into 
reusable objects.

> the sequence data from the SimpleAlign object, data mines the CD
> information and uses the options to guide the overwriting of residues
> with 'X'. excise() (will) then create an AlignIO output object of the
> requested format with the excised alignment. This is then returned to
> the caller, which can write out the excised alignment in the desired
> format.

> I think of this from an external perspective as a CD excising (Xing
> out) and data converting filter for alignment files.

>From your earlier description CD finding was the problem. 
Bio::SimpleAlign::slice do the slicing. On the other hand, from the 
description, I am not sure it is necessary to work with the alignment as a 
whole: It might be that it is best to treat each sequence separately. Of 
course, that depends on reliability of the alignment and what you have 
actually aligned!


> Is this a reasonable approach? 

> Would this be an appropriate module and 
> script for me to donate to Bioperl when properly done?

Yes, please.


 -Heikki

> Another question - I data mine from NCBI using only gi identifiers for
> the proteins. I have writen my own code to do this. Is there a Bioperl
> way to do get CD data for a protein and can this way allow me to
> obtain CD regions for PFAM or other identifiers as well?
>
> Thanks,
> Steve Lenk
> slenk@emich.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From jason.stajich at duke.edu  Wed Aug 31 13:34:44 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Aug 31 13:23:49 2005
Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
In-Reply-To: <200508311704.j7VH4DUV016842@ayrton.acpub.duke.edu>
References: <200508311704.j7VH4DUV016842@ayrton.acpub.duke.edu>
Message-ID: <AD5C2B05-51B4-4664-82F0-893B35334CE4@duke.edu>

http://fungal.genome.duke.edu/~jes12/software/scripts/ 
process_exonerate_gff3.pl

You may still want to massage it some, but I use the script in this  
basic form, maybe with a few tweaks:

Note that it requires you to run exonerate with specific --ryo  
options so that it includes the length of the query and hit sequences  
in the report output. should be covered in the perldoc in the script.

Without the ryo options enabled,  you'll need to modify the script  
more to have access to the original sequence db, use Bio::DB::Fasta,   
and put in some $dbh->length($seqid) calls instead.

I don't think the part which writes HSP/match lines is actually  
correct - it is trying to roll gapped HSPs from the similarity features.

I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse  
instance and/or grepping out the lines I really think I need.
You may want to s/exon/CDS/ for the protein2genome output as well.

-jason

On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:

> Jason,
>
> This message is in regards to an old thread  in which you offered  
> to shared a 'script for munging over' exonerate output for lading  
> in DB::GFF (c.f. http://bioperl.org/pipermail/bioperl-l/2005-April/ 
> 018741.html)
>
> Would you be willing to still share that script, if you've got it  
> around?
>
> Thanks, and regards,
>
> Malcolm Cook - mec@stowers-institute.org - 816-926-4449
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, MO  USA
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From MEC at Stowers-Institute.org  Wed Aug 31 13:04:10 2005
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Wed Aug 31 13:26:33 2005
Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
Message-ID: <200508311726.j7VHQIAH032054@portal.open-bio.org>

Jason,

This message is in regards to an old thread  in which you offered to shared a 'script for munging over' exonerate output for lading in DB::GFF (c.f. http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)

Would you be willing to still share that script, if you've got it around?

Thanks, and regards,

Malcolm Cook - mec@stowers-institute.org - 816-926-4449
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, MO  USA 


From slenk at emich.edu  Wed Aug 31 14:16:21 2005
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed Aug 31 14:05:37 2005
Subject: [Bioperl-l] Alignment excision script
Message-ID: <1aa8c0a1aa81e2.1aa81e21aa8c0a@emich.edu>

Heikki,

Thank you! Following is thinking out loud.

I will accept your advice and reconvert to 100% script, easy. No new 
object type will be created. Actually, I already have a seperate 
script with AlignIO objects, just did not explain well.

- The script will create AlignIO objects with format defined by user 
(or have Bioperl guess format if user does not specify ...). All IO 
will be done using them. Flexible, 'any' format in, 'any' format out 
with CD excised via X. We will use this in our analysis pipeline.

- The alignment must be treated as a whole as the default 'X'ing out 
(partial excision) considers if a whole column is part of a CD. I 
first X out designated CD residues, then look to see if the whole 
column is X'd out before making the final excision on a copy of the 
original sequences. I can return eiher a full (all designated CD) or 
partial (only columns that all have X). I have this code solid, and 
plan to use it internally to script. Reuse what works well already.

- I have extracted needed information from the input AlignIO object 
already and process it using the above method. The internal excised 
alignment data is right. Just a matter of loading it into the output 
AlignIO object. 

- I can use AlignIO methods to add excised sequences etc to output 
object formatted as requested by user, sounds easy. Will look at 
Utilities for any shortcuts.

- I will further examine the Bio::Tools::Analysis for Bioperl methods 
to get the needed CD data, which is really just start/end pairs for a 
given protein sequence. Nothing fancy needed as far as representation 
for the already working code. All I use is "$start $end" to represent 
excision regions for given CD for given protein sequence. I make an 
array of these for a given protein and use that when I do the initial 
Xing out. I'd like to have internal reuse of existing reliable code.

- I have a t/ directory for the earlier script. I will expand and 
reuse this. POD documentation is in the code. I will modify it to 
reflect current status.

Again, thank you.

Steve Lenk
slenk@emich.edu


From slenk at emich.edu  Wed Aug 31 19:34:54 2005
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed Aug 31 20:24:12 2005
Subject: [Bioperl-l] Bioperl adds utility to msaexcise script
Message-ID: <1b311c21b2a961.1b2a9611b311c2@emich.edu>


I adopted Heikki Lehvaslaiho's ideas. The script now reads/writes multiple formats based on users request on 
command line. Thanks Bioperl developers! Snippet below shows use of AlignIO. I'll work on better/more flexible 
data mining for CD regions next. I'd like to be able to use multiple types of protein id's as are in user's 
alignment and get CD for it.

eval {

	##############
	# input stream
	##############
	
	use Bio::AlignIO;
	my $in = Bio::AlignIO->new( -fh        => \*STDIN,
	                                             -format => $informat ) -> next_aln();
	
	########## 
	# excision
	##########
	
	my $out = _excise( $max_e_value, 
					   $use_all_cds,
					   $full_excise,
					   \@excise_cd,
					   $in );
	
	
	###############
	# output stream
	###############
	
	Bio::AlignIO->new( -format => $outformat,
                                      -fh         => \*STDOUT ) -> write_aln($out);
};

Thanks,
Steve Lenk
slenk@emich.edu