From walsh at cenix-bioscience.com Mon Aug 1 09:03:09 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Mon Aug 1 08:53:23 2005 Subject: [Bioperl-l] Patching lucy In-Reply-To: <42EA40F5.3090707@purdue.edu> References: <42EA40F5.3090707@purdue.edu> Message-ID: <42EE1D8D.2070708@cenix-bioscience.com> Hi Phillip, The patch pasted at the bottom of this e-mail should do the trick. When you say that lucy seg faults, I assume you mean that you get the segfault when running lucy on its own. The module itself does not call lucy. It is only parsing the output from the files that lucy creates. lucy itself should be taking phred files as its input. The patch is required if you want to use the stderr from the lucy to get more information from the module about the sequences. If you apply this patch, you can try running the test that comes with the lucy tarball (see the README.FIRST file in the distribution). It works for me (Suse 9.0 on a Pentium 3 box). Let me know if there are any problems. I will update the Appendix for Bio::Tools::Lucy in CVS. Cheers, Andrew 277a278,279 > /* AGW added next line */ > fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name); 588c590,592 < if ((seqs[i].len=bases)<=0) --- > if ((seqs[i].len=bases)<=0) { > /* AGW added next line */ > fprintf(stderr, "Empty: %s\n", seqs[i].name); 589a594 > } 893c898,902 < if (left) seqs[i].left+=left; --- > if (left) { > seqs[i].left+=left; > /* AGW added next line */ > fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); > } 896c905,909 < if (right) seqs[i].right-=right; --- > if (right) { > seqs[i].right-=right; > /* AGW added next line */ > fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); > } 898a912,913 > /* AGW added next line */ > fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name); 949a965,966 > /* AGW added next line */ > fprintf(stderr, "Vector: %s\n", seqs[i].name); Phillip SanMiguel wrote: > The patch to lucy source code from (the appendix): > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html > > seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs > fine, but the resulting executable (after make) seg faults when run on > the lucy test data. > > Any advice? > > I've sent email directly to the module creator, Andrew G. Walsh, as > requested in the module. But I'm not sure if the module creator > regularly monitors the hotmail account listed therein. So I thought I'd > post here, in case someone had a patch that would work on lucy-1.19. > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From n.haigh at sheffield.ac.uk Mon Aug 1 10:05:14 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Mon Aug 1 09:55:24 2005 Subject: [Bioperl-l] retrieving medline citations Message-ID: I know I can get medline citations using Bio::Biblio->get_by_id(); But can I convert the returned xml into the standard plain text format that is used for importing into citation managers such as endnote? Cheers Nathan Nathan Haigh Bioinformatics PostDoctoral Research Associate Room B2 211 Department of Animal and Plant Sciences University of Sheffield Western Bank Sheffield S10 2TN Tel: +44 (0)114 22 20112 Mob: +44 (0)7742 533 569 Fax: +44 (0)114 22 20002 From cain at cshl.edu Mon Aug 1 14:13:12 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 1 14:05:19 2005 Subject: [Bioperl-l] Newbie gbrowse help - script to make gff from fasta In-Reply-To: <1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <42E96847.1060900@ebi.ac.uk> <1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com> Message-ID: <1122919992.3857.22.camel@localhost.localdomain> On Sat, 2005-07-30 at 15:05 -0500, Jim Hu wrote: > 1) Is there an existing script to convert a refseq fasta into a gff > flatfile compatible with gbrowse 1.62? > > bp_genbank2gff.pl --accession NC_001416 --stdout > lambda.gff > > requires some additional tweaking/parsing as far as I can tell. I > know that I'll probably eventually load these into mySQL (but for > phage genomes, is it worth it?), but I wanted to learn via the > flatfiles first. I assume you mean genbank files, as there wouldn't be much to convert from a fasta file. Anyway, you should also try bp_genbank2gff3.pl. Be warned however, that converting genbank files to anything more stringent like GFF3 is fiendishly difficult, and depending on the genbank file, you may need to massage the output. > > 2) Is there a repository of standard track stanzas and aggregators > that match the feature types generated by such scripts? In the distribution are several example configuration files in contrib/conf_files. > > 3) Is there a FAQ I missed that I should have consulted first? No, but there is a tutorial that comes with GBrowse that covers lots of useful material. You can find it at http://localhost/gbrowse/tutorial/tutorial.html > > 4) Is this even the right listserv for these questions? Yes, and welcome! > > Didn't want to reinvent any wheels if possible. Sorry if this is off > topic. Thanks! > > Jim Hu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hu2307 at uidaho.edu Mon Aug 1 15:00:23 2005 From: hu2307 at uidaho.edu (Xiaojun Hu) Date: Mon Aug 1 14:50:37 2005 Subject: [Bioperl-l] ABI average singal intensity Message-ID: <5f2c7e5efe93.5efe935f2c7e@uidaho.edu> Hi, Does anyone know how to get the (A T C G)average singal intensity from ABI file? Thank you very much! Xiaojun Hu From cain at cshl.edu Mon Aug 1 15:07:36 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 1 14:57:48 2005 Subject: [Bioperl-l] Newbie gbrowse help - script to make gff from fasta In-Reply-To: <1122919992.3857.22.camel@localhost.localdomain> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <42E96847.1060900@ebi.ac.uk> <1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com> <1122919992.3857.22.camel@localhost.localdomain> Message-ID: <1122923256.3857.27.camel@localhost.localdomain> On Mon, 2005-08-01 at 14:13 -0400, Scott Cain wrote: > > > > > 4) Is this even the right listserv for these questions? > > Yes, and welcome! > > Whoops! I guess I should have looked at the list that you emailed your questions to before I answered this one. For some reason, I just assumed that this was on the gbrowse mailing list, which is Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From anunberg at oriongenomics.com Mon Aug 1 15:29:50 2005 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Mon Aug 1 15:21:46 2005 Subject: [Bioperl-l] Connecting to Bio::DB::GFF db Message-ID: When connection to a Bio::DB::GFF db, how do I specify the host ? I would like to connect to a db on another machine Thanks -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From cain at cshl.edu Mon Aug 1 15:41:10 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 1 15:35:27 2005 Subject: [Bioperl-l] Connecting to Bio::DB::GFF db In-Reply-To: References: Message-ID: <1122925270.3857.35.camel@localhost.localdomain> You have to use a dsn that is appropriate for your database server--that is, the mysql one will look a little different from a postgres one, but generally, it will look like this: -dsn dbi:mysql:elegans;host=hostname;port=port_number You can leave off port if the database server is using a standard port. On Mon, 2005-08-01 at 14:29 -0500, Andrew Nunberg wrote: > When connection to a Bio::DB::GFF db, how do I specify the host ? I would > like to connect to a db on another machine > > Thanks > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From pmiguel at purdue.edu Mon Aug 1 15:39:07 2005 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Mon Aug 1 15:35:31 2005 Subject: [Bioperl-l] Patching lucy In-Reply-To: <42EE1D8D.2070708@cenix-bioscience.com> References: <42EA40F5.3090707@purdue.edu> <42EE1D8D.2070708@cenix-bioscience.com> Message-ID: <42EE7A5B.6050701@purdue.edu> Hi Andrew, Thanks for the effort you went to here. Still looks there is a (more minor) problem though. patch gives a few errors (see below) using your new diff. Looks like 2 of the 7 patches failed to patch lucy.c from lucy version lucy-1.19p. But the resulting source code does compile and run on the lucy test data. But the PolyA patches did not get inserted. Do you know if all 7 of your patches were installed into the lucy.c file from lucy-1.19p? (By the way, I think we are on the same page. I do understand that your perl code parses lucy output. I've tried it on lucy 1.19p output and it succeeds--although it, of course, lacks some of the functionality that would be available from the patched version of lucy). Phillip Here is the output when I run patch: (lucy)% cd lucy-1.19p (lucy-1.19p)% patch -b -i AndrewsNewPatch.diff lucy.c Looks like a normal diff. Hunk #4 failed at line 893. Hunk #5 failed at line 896. 2 out of 7 hunks failed: saving rejects to lucy.c.rej I can't seem to find a patch in there anywhere. Here is the lucy.c.rej file contents: *************** *** 893,893 **** ! if (left) seqs[i].left+=left; --- 898,902 ---- ! if (left) { ! seqs[i].left+=left; ! /* AGW added next line */ ! fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); ! } *************** *** 896,896 **** ! if (right) seqs[i].right-=right; --- 905,909 ---- ! if (right) { ! seqs[i].right-=right; ! /* AGW added next line */ ! fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); ! } Andrew Walsh wrote: > Hi Phillip, > > The patch pasted at the bottom of this e-mail should do the trick. > When you say that lucy seg faults, I assume you mean that you get the > segfault when running lucy on its own. The module itself does not > call lucy. It is only parsing the output from the files that lucy > creates. lucy itself should be taking phred files as its input. The > patch is required if you want to use the stderr from the lucy to get > more information from the module about the sequences. If you apply > this patch, you can try running the test that comes with the lucy > tarball (see the README.FIRST file in the distribution). It works for > me (Suse 9.0 on a Pentium 3 box). Let me know if there are any > problems. I will update the Appendix for Bio::Tools::Lucy in CVS. > > Cheers, > > Andrew > > > 277a278,279 > > /* AGW added next line */ > > fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name); > 588c590,592 > < if ((seqs[i].len=bases)<=0) > --- > > if ((seqs[i].len=bases)<=0) { > > /* AGW added next line */ > > fprintf(stderr, "Empty: %s\n", seqs[i].name); > 589a594 > > } > 893c898,902 > < if (left) seqs[i].left+=left; > --- > > if (left) { > > seqs[i].left+=left; > > /* AGW added next line */ > > fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); > > } > 896c905,909 > < if (right) seqs[i].right-=right; > --- > > if (right) { > > seqs[i].right-=right; > > /* AGW added next line */ > > fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); > > } > 898a912,913 > > /* AGW added next line */ > > fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name); > 949a965,966 > > /* AGW added next line */ > > fprintf(stderr, "Vector: %s\n", seqs[i].name); > > > > > Phillip SanMiguel wrote: > >> The patch to lucy source code from (the appendix): >> >> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html >> >> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs >> fine, but the resulting executable (after make) seg faults when run >> on the lucy test data. >> >> Any advice? >> >> I've sent email directly to the module creator, Andrew G. Walsh, as >> requested in the module. But I'm not sure if the module creator >> regularly monitors the hotmail account listed therein. So I thought >> I'd post here, in case someone had a patch that would work on lucy-1.19. >> > > From cain at cshl.edu Mon Aug 1 15:45:00 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 1 15:35:53 2005 Subject: [Bioperl-l] Re: Fixing bioperl [was Re: Analysis features] In-Reply-To: <51a02b5bd508f35301ee3c847b104895@gnf.org> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <1122650232.10455.31.camel@localhost.localdomain> <51a02b5bd508f35301ee3c847b104895@gnf.org> Message-ID: <1122925500.3857.40.camel@localhost.localdomain> On Fri, 2005-07-29 at 17:20 -0700, Hilmar Lapp wrote: > On Jul 29, 2005, at 8:17 AM, Scott Cain wrote: > > > > > The main section of affected code in gmod is the GFF bulk loader, but > > after we make the changes to the bioperl API, it shouldn't be too hard > > to fix the loader. In fact, some of those changes may have already > > started. I remember a few weeks before I release the gmod/chado > > package, Hilmar sent out an announcement that he made some changes. > > You mean around the time of ISMB? I fixed the ontology modules ... they > should actually work better now not worse unless you assumed the > presence of some bugs ;) I guess I must have been assuming bugs :-) I didn't look at diffs, or in much detail what the exact problem was. Since this is the last release that will be using Bio::Onotology, and it is an alpha release, I was not too concerned. > > > While I should have paid attention then, I was busy getting my release > > together, and everything seemed to work, so I ignored it. > > Unfortunately, the reason things continued to work was that I forgot to > > update my bioperl-live, and as a result, the gmod release doesn't work > > with bioperl-live. > > Scott, what would really help sometimes is if in such a situation you > run the bioperl test suite and report the result if there are any > failures, especially those that appear potentially connected to your > problem. Last time the gmod ontology loader ceased to work the problem > would have been readily exposed by the ontology tests in bioperl. It > just helps in zooming in on the problem. I run make test frequently; what I do less often is pay close attention to the result. When working with bioperl-live, one gets a little numb to test failures :-/ > > I'd be eager to help make bioperl work with gmod and vice versa and I'm > sure many others are too, but it'll be difficult if we don't work > towards this collaboratively. For this I really liked the spirit of > Chris' proposal - that's the way to make this work. > > > [...] > > The other section of code that could have been affected but won't be is > > the ontology loader. The current ontology loader depends on > > Bio::Ontology, but I was already planning on migrating to go-perl for > > loading ontologies anyway, so that won't be a problem. > > I'm closing in on the last bugs in the go-perl integration. It remains > to be seen how fast the result is as Chris made me aware in Detroit, > but if it works this will give you both worlds at your choosing. > > -hilmar > > > > > So, who wants to take the lead on this? > > > > Thanks, > > Scott > > > > > > On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote: > >> I think the answer may be even more complicated than this. > >> > >> Lurkers and contributors to the bioperl mailing list may have noticed > >> that > >> there has been some major obstacles in progressing lately, > >> particularly in > >> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is > >> a > >> developers release, though this is the one required by GMOD. > >> > >> My understanding is that this bottleneck can be traced back to > >> changes in > >> the SeqFeature and Annotation model. These changes appear to be > >> required > >> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff > >> (which in turn is used by the GMOD bulk loader, which is the main > >> reason > >> GMOD requires 1.5, I believe?). Unfortunately, these changes also > >> break > >> existing code and have a severe negative impact on memory usage. > >> > >> Before advising Cyril and others to switch to BFIO::gff I think it's > >> important to make sure there is a clear path forward with bioperl. My > >> impression is that there is something of a stalemate here. The bioperl > >> developers would like to retract the aforementioned changes, but they > >> believe they cannot do this without breaking GMOD code. They are also > >> extremely uncomfortable about leaving these changes in. Everyone > >> gives up > >> and starts coding around bioperl. > >> > >> Here is why the changes were introduced: > >> > >> BioPerl has a 'scruffy' typing model, whereby feature types > >> (primary_tag > >> in bioperl) and featureprop types (tags in bioperl) are labels or > >> strings. > >> In contrast, Chado forces all types to be some class or relation in an > >> ontology. > >> > >> Now obviously I'm rather partial to the Chado model, but that doesn't > >> mean > >> I think it should be forced upon bioperl. I often use bioperl in > >> scruffy > >> mode (on scruffy data); or in some combination whereby I map the > >> scruffy > >> types to ontologies in some non-bioperl code. When using bioperl as a > >> middleware component over a nicely organised database, ontology-typed > >> mode > >> is definitely best. However, the majority of bioperl users (including > >> myself) spend a large proportion of their time working with scruffy > >> data, > >> in which case lightweight scruffy types are more appropriate. > >> > >> It seems that there is a perfectly simple way of reconciling both > >> approaches. We revert bioperl back to the simpler scruffy model. The > >> majority of users and developers breathe a sigh of relief. We then > >> extend > >> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces > >> types to > >> be stored as OntologyTerms (and I haven't even touched on some of the > >> problems here, but at least we are insulating the standard bioperl > >> layer > >> that 99% of users use from these issues). All classes implementing > >> SFAI > >> will necessarily implement SFI, and the primary_tag and tag_values > >> methods > >> will be supported (not deprecated) as simple delegations to the > >> OntologyTerm objects. > >> > >> We can then modify BFIO::gff (which is an incredibly useful piece of > >> code) > >> and get rid of all the dependencies on SO and Bio::Ontology* and > >> instead > >> allow the user of this module to plug in their own resolver/validator > >> - so > >> they can choose whether they just want fast scruffy lightweight SFI > >> features, or whether they want ontology-typed SFAI features. If the > >> latter, then they can choose their own resolver strategy - by a user > >> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a > >> local chado db, by the genbank->SO mapping table, during parsing vs > >> post-parsing, whatever. In fact there is already > >> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly > >> concerned > >> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy > >> genbank > >> to something sensible. > >> > >> GMOD (and perhaps biosql) would use SFAI, everyone else would use the > >> simpler SFI. Someone can even get a stable 1.6 release out before all > >> the > >> SFAI details such as how the resolver would work are finalised. I'd > >> really > >> like to see 1.6 include a simpler BFIO::gff that can optionally > >> produces > >> features that aren't SeqFeature::Annotateds, but that's negotiable. > >> > >> There's vast swathes of both GMOD and BioPerl code I'm not familiar > >> with, > >> so it's possible my analysis above is flawed in some way. If it is, > >> then > >> it's up to someone from either camp to speak up! If not, then there's > >> no > >> excuses for the relevant people to start sorting out this mess by > >> commencing with the solution outlined above. > >> > >> Cheers > >> Chris > >> > >>> > >>> Scott > >>> > >>> > >>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote: > >>>> Hello, > >>>> We are going to store analysis results in chado, and we are of > >>>> course > >>>> very interressed by these futur evolutions of GFF3/chado. > >>>> So we would like to make sure that the parsers and conversions > >>>> programs > >>>> we are writing now will be compatible with the futur GFF3. > >>>> > >>>> We are using Bio::SeqFeature::Generic objects that we write with > >>>> Bio::Tools::GFF. > >>>> > >>>> Do you think that Bio::Tools::GFF will be able to handle the new > >>>> 'type' > >>>> column or is it better to switch to Bio::FeatureIO::gff ? > >>>> > >>>> Thanks in advance for any advice. > >>>> > >>>> Cyril > >>>> > >>>> Don Gilbert wrote: > >>>> > >>>>> > >>>>> Scott, > >>>>> > >>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in > >>>>> same direction I suggest below. More about these todo points > >>>>> > >>>>>> - address flybase"s use of of analysisfeature combined with > >>>>>> feature to > >>>>>> give source-type information (in GFF terms). This will need to > >>>>>> be addressed in the GBrowse adaptor. > >>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is, > >>>>>> containing > >>>>>> both analysis results and annotations). See perldoc > >>>>>> gmod_bulk_load_gff3.pl > >>>>>> for more info > >>>>> > >>>>> > >>>>> Use of chado's analysisfeature table is something others who know > >>>>> it better can comment on. But after working with it for a while > >>>>> it makes sense to me to use in this way: > >>>>> > >>>>> For a future GFF -> Chado loader, treat analysis features such as > >>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather > >>>>> than feature CV term type (the ones that now end up with a generic > >>>>> 'match' cvterm). In these cases the Analysis table is populated > >>>>> with > >>>>> program:database_sourcename > >>>>> as the basis of this 'analysisfeature type', such as > >>>>> match:blastx:na_pe.dros > >>>>> match:sim4:DGC > >>>>> match:genie:dummy (or maybe exon:genie) > >>>>> > >>>>> The program:database fits neatly in GFF source field, as > >>>>> #ref source type start stop ... > >>>>> chr1 blastx:na_pe.dros match 1 100 ... > >>>>> chr1 sim4:DGC match 1 100 ... > >>>>> > >>>>> These can be treated in database adaptor analogously to the CVterm > >>>>> table feature types. See at end a list of current GFF feature > >>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a > >>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of > >>>>> BLAT:EMBL_BEST. > >>>>> > >>>>> From POD of your bulk_load_gff3.pl > >>>>>> Analysis > >>>>>> If you are loading analysis results (ie, BLAT results, gene > >>>>>> predictions), you should specify the -a flag. If no arguments are > >>>>>> supplied with the -a, then the loader will assume that the results > >>>>>> belong to an analysis set with a name that is the concatenation of > >>>>>> the source (column 2) and the method (column 3) with an underscore > >>>>>> in between. > >>>>> > >>>>> "... then the loader will assume that the results belong to an > >>>>> analysis table row with a program name and database source name > >>>>> taken from Source (column 2, colon separated program:sourcename), > >>>>> with a SOFA feature type taken from Method (column 3). If > >>>>> sourcename doesn't apply, e.g. genefinder, don't add or use > >>>>> 'dummy'. > >>>>> Use the generic 'match' SOFA type if others don't apply." > >>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS] > >>>>> > >>>>> Note that sourcename of database is a common attribute (all those > >>>>> blasts, blats, sim4, ... are run on several different databases). > >>>>> > >>>>> For that underscore between method and source, where does that go > >>>>> into > >>>>> database? It is used as parts of program or database sourcename > >>>>> names, > >>>>> so it may be problematic to add one if not needed. > >>>>> > >>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' > >>>>> entry > >>>>> for analysis table. This probably is less useful than using Program > >>>>> and Sourcename fields as flybase does, which comes from the common > >>>>> usage where people run various programs, with various database > >>>>> sources > >>>>> and want to plop the results into a database easily. These go into > >>>>> those > >>>>> two fields directly, no need to create or parse a Name entry > >>>>> (which can be and is null in flybase data). > >>>>> > >>>>>> my $search_analysis > >>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?"); > >>>>> > >>>>> I think it would be better as > >>>>> my $search_analysis > >>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? > >>>>> and > >>>>> sourcename=?"); > >>>>> > >>>>>> Otherwise, the argument provided with -a will be taken > >>>>>> as the name of the analysis set. Either way, the analysis set must > >>>>>> already be in the analysis table. The easist way to do this is to > >>>>>> insert it directly in the psql shell: > >>>>>> > >>>>>> INSERT INTO analysis (name, program, programversion) > >>>>>> VALUES ('genscan 2005-2-28','genscan','5.4'); > >>>>> > >>>>> My choice would be to populate the analysis table from GFF data, > >>>>> rather > >>>>> than expect prepraration by user (or as another option). > >>>>> > >>>>> INSERT INTO analysis (program, sourcename) > >>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse'); > >>>>> INSERT INTO analysis (program, sourcename) > >>>>> VALUES ('sim4','na_gb.dmel'); > >>>>> INSERT INTO analysis (program, sourcename, programversion) > >>>>> VALUES ('genie_masked','dummy', '1.0'); > >>>>> > >>>>>> There are other columns in the analysis table that are optional; > >>>>>> see > >>>>>> the schema documentation and '\d analysis' in psql for more > >>>>>> information. > >>>>>> > >>>>> .... > >>>>>> A planned addtion to the functionality of handling analysis > >>>>>> results > >>>>>> is to allow "mixed" GFF files, where some lines are analysis > >>>>>> results > >>>>>> and some are not. > >>>>> > >>>>> This is the case for drosophila GFF now (see others also below). If > >>>>> you make the default assumption that if ($method =~ /.*match/) and > >>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of > >>>>> analysisfeature types, and probably not anything else. > >>>>> > >>>>>> Additionally, one will be able to supply lists of > >>>>>> types (optionally with sources) and their associated entry in the > >>>>>> analysis table. The format will probably be tag value pairs: > >>>>>> > >>>>>> --analysis match:Rice_est=rice_est_blast, \ > >>>>>> match:Maize_cDNA=maize_cdna_blast, \ > >>>>>> mRNA=genscan_prediction,exon=genscan_prediction > >>>>> > >>>>> My suggestion for this (as per GFF source,type columns) would be > >>>>> --analysis match:program:sourcename ... > >>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\ > >>>>> mRNA:genscan:dummy, exon:genscan:dummy > >>>>> > >>>>> I guess the 'dummy' data sourcename need not be added; flybase > >>>>> uses it > >>>>> to keep that field not-null, but it isn't required by the schema. > >>>>> > >>>>> Here are some snippets from the ChadoFC adaptor I modified > >>>>> from yours (will get into cvs.sf.net 'real soon'), showing that > >>>>> it isn't much work to add this as an analog to how cvterm types > >>>>> are used. > >>>>> > >>>>> -- Don > >>>>> > >>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types > >>>>> ## treat similar to CV table types > >>>>> > >>>>> sub getAnalysisFeatureHash > >>>>> { > >>>>> my $self= shift; > >>>>> > >>>>> my $dbh= $self->dbh(); > >>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from > >>>>> analysis") > >>>>> or warn "unable to prepare select cvterms"; > >>>>> $sth->execute or $self->throw("unable to select cvterms"); > >>>>> > >>>>> my(%term2name,%name2term) = ({},{}); > >>>>> > >>>>> while (my $hashref = $sth->fetchrow_hashref) { > >>>>> > >>>>> ## this is dgg syntax of analysis feature names for GFF > >>>>> ## all have generic 'match' method and program:source as 'source' > >>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie .. > >>>>> etc. > >>>>> my $anfeat= > >>>>> "match:".$hashref->{program}.":".$hashref->{sourcename}; > >>>>> > >>>>> $term2name{ $hashref->{analysis_id} } = $anfeat; > >>>>> $name2term{ $anfeat } = $hashref->{analysis_id}; > >>>>> } > >>>>> $self->an_term2name(\%term2name); > >>>>> $self->an_name2term(\%name2term); > >>>>> } > >>>>> > >>>>> ## Das::ChadoFC::Segment snippets > >>>>> sub features { > >>>>> $self->{has_anatype}=0; > >>>>> my $sql_range = ''; > >>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types); > >>>>> unless ($feature_id) { > >>>>> $sql_range = $self->sql_range($rangetype); > >>>>> > >>>>> $sql_types = $self->sql_types($types, -1); # dgg > >>>>> > >>>>> $srcfeature_id = $self->{srcfeature_id}; > >>>>> } > >>>>> ... > >>>>> elsif($self->{has_anatype}) { > >>>>> $from_part .= "left join analysisfeature af using (feature_id) "; > >>>>> } > >>>>> > >>>>> > >>>>> sub sql_types > >>>>> .. > >>>>> $valid_type = $factory->name2term($temp_type); > >>>>> $is_anatype= 0; > >>>>> unless ($valid_type) { > >>>>> $valid_type = $factory->an_name2term($temp_type); > >>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type); > >>>>> } > >>>>> .. > >>>>> ## leave out extra invalid types > >>>>> if (!$valid_type) { > >>>>> ### skip > >>>>> } elsif ($temp_dbxref) { > >>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id = > >>>>> $temp_dbxref)"; > >>>>> } elsif($is_anatype) { > >>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<< > >>>>> } else { > >>>>> $sql_types .= $orsql."(f.type_id = $valid_type)"; > >>>>> } > >>>>> > >>>>> > >>>>> Lists of GFF feature type:source from some current MOD data > >>>>> where * are probably analysisfeature types (program:database) > >>>>> > >>>>> rice gff type:source > >>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ > >>>>> sequence_annotation/ > >>>>> gff3/ > >>>>> -------------------- > >>>>> CDS:known > >>>>> CDS:tigr > >>>>> EST:cmap > >>>>> EST_match:Barley (? might be EST_match:someprogram:Barley) > >>>>> EST_match:Maize > >>>>> EST_match:Millet > >>>>> EST_match:Rice > >>>>> EST_match:Sorghum > >>>>> EST_match:Wheat > >>>>> cDNA_match:Rice > >>>>> cross_genome_match:Maize > >>>>> cross_genome_match:Rice > >>>>> cross_genome_match:Sorghum > >>>>> * exon:FgenesH:Monocot > >>>>> exon:known > >>>>> exon:tigr > >>>>> five_prime_UTR:tigr > >>>>> gene:known > >>>>> gene:tigr > >>>>> * mRNA:FgenesH:Monocot > >>>>> mRNA:known > >>>>> mRNA:tigr > >>>>> microsatellite:cmap > >>>>> three_prime_UTR:known > >>>>> three_prime_UTR:tigr > >>>>> transposable_element_insertion_site:cmap > >>>>> > >>>>> worm gff type:source > >>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/ > >>>>> genome_feature_tables/GFF3/ > >>>>> ---------------------- > >>>>> CDS:Coding_transcript > >>>>> * CDS:Genefinder > >>>>> CDS:Transposon_CDS > >>>>> CDS:history > >>>>> * CDS:twinscan > >>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST) > >>>>> * EST_match:BLAT_EST_OTHER > >>>>> PCR_product:GenePair_STS > >>>>> PCR_product:Orfeome > >>>>> RNAi_reagent:RNAi_primary > >>>>> RNAi_reagent:RNAi_secondary > >>>>> SNP:Allele > >>>>> binding_site:binding_site > >>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST ) > >>>>> * cDNA_match:BLAT_mRNA_OTHER > >>>>> clone_end:. > >>>>> clone_start:. > >>>>> complex_substitution :Allele > >>>>> deletion:Allele > >>>>> exon:Coding_transcript > >>>>> * exon:Genefinder > >>>>> exon:Non_coding_transcript > >>>>> exon:Pseudogene > >>>>> exon:Transposon_CDS > >>>>> exon:history > >>>>> exon:miRNA > >>>>> exon:rRNA > >>>>> exon:scRNA > >>>>> exon:snRNA > >>>>> exon:snoRNA > >>>>> exon:tRNA > >>>>> * exon:tRNAscan-SE-1.23 > >>>>> * exon:twinscan > >>>>> experimental_result_region:Expr_profile > >>>>> experimental_result_region:cDNA_for_RNAi > >>>>> * expressed_sequence_match:BLAT_OST_BEST (~ > >>>>> expressed_sequence_match:BLAT:OST_BEST ) > >>>>> * expressed_sequence_match:BLAT_OST_OTHER > >>>>> five_prime_UTR:Coding_transcript > >>>>> gene:Coding_transcript > >>>>> gene:gene > >>>>> gene:history > >>>>> gene:landmark > >>>>> insertion:Allele > >>>>> inverted_repeat:inverted > >>>>> mRNA:Coding_transcript > >>>>> * mRNA:Genefinder > >>>>> mRNA:Transposon_CDS > >>>>> mRNA:history > >>>>> * mRNA:twinscan > >>>>> miRNA:miRNA > >>>>> nc_primary_transcript:Non_coding_transcript > >>>>> * nucleotide_match:BLAT_EMBL_BEST (~ > >>>>> nucleotide_match:BLAT:EMBL_BEST ) > >>>>> * nucleotide_match:BLAT_EMBL_OTHER > >>>>> * nucleotide_match:BLAT_TC1_BEST > >>>>> * nucleotide_match:BLAT_TC1_OTHER > >>>>> * nucleotide_match:BLAT_ncRNA_BEST > >>>>> * nucleotide_match:BLAT_ncRNA_OTHER > >>>>> * nucleotide_match:TEC_RED > >>>>> * nucleotide_match:waba_coding > >>>>> * nucleotide_match:waba_strong > >>>>> * nucleotide_match:waba_weak > >>>>> oligo:. > >>>>> operon:operon > >>>>> polyA_signal_sequence:polyA_signal_sequence > >>>>> polyA_site:polyA_site > >>>>> processed_transcript:gene > >>>>> protein_coding_primary_transcript:Coding_transcript > >>>>> * protein_match:wublastx > >>>>> pseudogene:Pseudogene > >>>>> pseudogene:history > >>>>> rRNA:rRNA > >>>>> reagent:Oligo_set > >>>>> region:. > >>>>> region:Genbank > >>>>> region:Genomic_canonical > >>>>> region:Link > >>>>> * repeat_region:RepeatMasker > >>>>> scRNA:scRNA > >>>>> sequence_variant:. > >>>>> sequence_variant:Allele > >>>>> snRNA:snRNA > >>>>> snoRNA:snoRNA > >>>>> substitution:Allele > >>>>> tRNA:tRNA > >>>>> * tRNA:tRNAscan-SE-1.23 > >>>>> tandem_repeat:tandem > >>>>> three_prime_UTR:Coding_transcript > >>>>> trans_splice_acceptor_site:SL1 > >>>>> trans_splice_acceptor_site:SL2 > >>>>> transcript:SAGE_transcript > >>>>> * translated_nucleotide_match:BLAT_NEMATODE (~ > >>>>> translated_nucleotide_match:BLAT:NEMATODE ) > >>>>> transposable_element:Transposon > >>>>> transposable_element:Transposon_CDS > >>>>> transposable_element_insertion_site:Allele > >>>>> transposable_element_insertion_site:Mos_insertion_allele > >>>>> > >>>>> > >>>>> fly gff type:source > >>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/ > >>>>> ----------------------- > >>>>> BAC:. > >>>>> CDS:. > >>>>> aberration_junction:. > >>>>> chromosome:. > >>>>> chromosome_arm:. > >>>>> chromosome_band:. > >>>>> enhancer:. > >>>>> exon:. > >>>>> five_prime_UTR:. > >>>>> gene:. > >>>>> insertion_site:. > >>>>> intron:. > >>>>> mRNA:. > >>>>> * match:RNAiHDP > >>>>> * match:assembly:path > >>>>> * match:blastx:aa_SPTR.dmel > >>>>> * match:blastx:aa_SPTR.insect > >>>>> * match:blastx:aa_SPTR.othinv > >>>>> * match:blastx:aa_SPTR.othvert > >>>>> * match:blastx:aa_SPTR.plant > >>>>> * match:blastx:aa_SPTR.primate > >>>>> * match:blastx:aa_SPTR.rodent > >>>>> * match:blastx:aa_SPTR.worm > >>>>> * match:blastx:aa_SPTR.yeast > >>>>> * match:genscan > >>>>> * match:repeatmasker > >>>>> * match:sim4:na_ARGs.dros > >>>>> * match:sim4:na_ARGsCDS.dros > >>>>> * match:sim4:na_DGC_dros > >>>>> * match:sim4:na_dbEST.diff.dmel > >>>>> * match:sim4:na_dbEST.same.dmel > >>>>> * match:sim4:na_gadfly_dmel_r2 > >>>>> * match:sim4:na_gb.dmel > >>>>> * match:sim4:na_gb.tpa.dmel > >>>>> * match:sim4:na_smallRNA.dros > >>>>> * match:sim4:na_transcript_dmel_r31 > >>>>> * match:sim4:na_transcript_dmel_r32 > >>>>> * match:tRNAscan-SE:. > >>>>> * match:tblastx:na_agambiae > >>>>> * match:tblastx:na_dbEST.insect > >>>>> * match:tblastx:na_dpse > >>>>> * match_part:RNAiHDP > >>>>> * match_part:assembly:path > >>>>> * match_part:blastx:aa_SPTR.dmel > >>>>> * match_part:blastx:aa_SPTR.insect > >>>>> * match_part:blastx:aa_SPTR.othinv > >>>>> * match_part:blastx:aa_SPTR.othvert > >>>>> * match_part:blastx:aa_SPTR.plant > >>>>> * match_part:blastx:aa_SPTR.primate > >>>>> * match_part:blastx:aa_SPTR.rodent > >>>>> * match_part:blastx:aa_SPTR.worm > >>>>> * match_part:blastx:aa_SPTR.yeast > >>>>> * match_part:genscan > >>>>> * match_part:repeatmasker > >>>>> * match_part:sim4:na_ARGs.dros > >>>>> * match_part:sim4:na_ARGsCDS.dros > >>>>> * match_part:sim4:na_DGC_dros > >>>>> * match_part:sim4:na_dbEST.diff.dmel > >>>>> * match_part:sim4:na_dbEST.same.dmel > >>>>> * match_part:sim4:na_gadfly_dmel_r2 > >>>>> * match_part:sim4:na_gb.dmel > >>>>> * match_part:sim4:na_gb.tpa.dmel > >>>>> * match_part:sim4:na_smallRNA.dros > >>>>> * match_part:sim4:na_transcript_dmel_r31 > >>>>> * match_part:sim4:na_transcript_dmel_r32 > >>>>> * match_part:tRNAscan-SE:. > >>>>> * match_part:tblastx:na_agambiae > >>>>> * match_part:tblastx:na_dbEST.insect > >>>>> * match_part:tblastx:na_dpse > >>>>> mature_peptide:. > >>>>> ncRNA:. > >>>>> oligo:. > >>>>> point_mutation:. > >>>>> polyA_site:. > >>>>> protein_binding_site:. > >>>>> pseudogene:. > >>>>> region:. > >>>>> regulatory_region:. > >>>>> rescue_fragment:. > >>>>> scaffold:. > >>>>> sequence_variant:. > >>>>> snRNA:. > >>>>> snoRNA:. > >>>>> tRNA:. > >>>>> three_prime_UTR:. > >>>>> transcription_start_site:. > >>>>> transposable_element:. > >>>>> transposable_element_insertion_site:. 3116 > >>>>> > >>>>> > >>>>> yeast gff type:source count > >>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ > >>>>> chromosomal_feature/saccharomyces_cerevisiae.gff > >>>>> ------------------------- > >>>>> ARS:SGD > >>>>> CDS:SGD > >>>>> binding_site:SGD > >>>>> centromere:SGD > >>>>> chromosome:SGD > >>>>> gene:SGD > >>>>> insertion:SGD > >>>>> intron:SGD > >>>>> ncRNA:SGD > >>>>> nc_primary_transcript:SGD > >>>>> nucleotide_match:SGD > >>>>> pseudogene:SGD > >>>>> rRNA:SGD > >>>>> region:SGD > >>>>> region:landmark > >>>>> repeat_family:SGD > >>>>> repeat_region:SGD > >>>>> snRNA:SGD > >>>>> snoRNA:SGD > >>>>> tRNA:SGD > >>>>> telomere:SGD > >>>>> transposable_element:SGD > >>>>> transposable_element_gene:SGD > >>>>> > >>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > >>>>> -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------- > >>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar > >>>>> happening > >>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest > >>>>> in dual > >>>>> core and dual graphics technology at this free one hour event > >>>>> hosted > >>>>> by HP, AMD, and NVIDIA. To register visit > >>>>> http://www.hp.com/go/dualwebinar > >>>>> _______________________________________________ > >>>>> Gmod-gbrowse mailing list > >>>>> Gmod-gbrowse@lists.sourceforge.net > >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > >>>>> > >>>> > >>>> > >>> -- > >>> --------------------------------------------------------------------- > >>> --- > >>> Scott Cain, Ph. D. > >>> cain@cshl.edu > >>> GMOD Coordinator (http://www.gmod.org/) > >>> 216-392-3087 > >>> Cold Spring Harbor Laboratory > >>> > >>> > >>> > >>> ------------------------------------------------------- > >>> SF.Net email is Sponsored by the Better Software Conference & EXPO > >>> September > >>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > >>> Agile & Plan-Driven Development * Managing Projects & Teams * > >>> Testing & QA > >>> Security * Process Improvement & Measurement * > >>> http://www.sqe.com/bsce5sf > >>> _______________________________________________ > >>> Gmod-devel mailing list > >>> Gmod-devel@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/gmod-devel > >>> > >> > >> > >> > >> > >> ------------------------------------------------------- > >> SF.Net email is Sponsored by the Better Software Conference & EXPO > >> September > >> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > >> Agile & Plan-Driven Development * Managing Projects & Teams * Testing > >> & QA > >> Security * Process Improvement & Measurement * > >> http://www.sqe.com/bsce5sf > >> _______________________________________________ > >> Gmod-devel mailing list > >> Gmod-devel@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/gmod-devel > > -- > > ----------------------------------------------------------------------- > > - > > Scott Cain, Ph. D. > > cain@cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gnf.org Mon Aug 1 15:53:05 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Aug 1 15:42:13 2005 Subject: [Bioperl-l] Re: Fixing bioperl [was Re: Analysis features] In-Reply-To: <1122925500.3857.40.camel@localhost.localdomain> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <1122650232.10455.31.camel@localhost.localdomain> <51a02b5bd508f35301ee3c847b104895@gnf.org> <1122925500.3857.40.camel@localhost.localdomain> Message-ID: <2aae0a4129cb2c7407df5834b94f41aa@gnf.org> On Aug 1, 2005, at 12:45 PM, Scott Cain wrote: > I run make test frequently; what I do less often is pay close attention > to the result. When working with bioperl-live, one gets a little numb > to test failures :-/ I know, and it's not a good situation. -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Mon Aug 1 22:31:12 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 1 22:22:04 2005 Subject: [Bioperl-l] all tests pass [was Re: Fixing bioperl] [was Re: Analysis features] In-Reply-To: <2aae0a4129cb2c7407df5834b94f41aa@gnf.org> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <1122650232.10455.31.camel@localhost.localdomain> <51a02b5bd508f35301ee3c847b104895@gnf.org> <1122925500.3857.40.camel@localhost.localdomain> <2aae0a4129cb2c7407df5834b94f41aa@gnf.org> Message-ID: <2bf4b9070ab5bb61b34e15d3ae611044@duke.edu> I'm getting all tests passing for me on OSX and a few different linux machines with different complements of aux modules installed. I fixed some minor things that were breaking. We want to setup a nightly 'make test' cronjob on one of the obf servers -- just need someone to have enough time to do it... there are a lot of different subset of aux modules installed + perl version + OS combos to try out so we need to know what is breaking if it is. I was really hoping someone would step up to push 1.5.1 out which is just a release off the main trunk and then think about a schedule for 1.6. Can anyone help outline what must get fixed for 1.6 so there can be a checklist that people can help on (and to know when we are ready to release). I guess ideally this would be done on a wiki, but mailing list can suffice too. -jason On Aug 1, 2005, at 3:53 PM, Hilmar Lapp wrote: > > On Aug 1, 2005, at 12:45 PM, Scott Cain wrote: > >> I run make test frequently; what I do less often is pay close >> attention >> to the result. When working with bioperl-live, one gets a little numb >> to test failures :-/ > > I know, and it's not a good situation. > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich http://www.duke.edu/~jes12 jason.stajich -at- duke.edu From hase at umbc.edu Mon Aug 1 23:27:08 2005 From: hase at umbc.edu (HASE) Date: Mon Aug 1 23:17:30 2005 Subject: [Bioperl-l] Bioinformatics Software Development Survey Message-ID: <2008.68.49.173.177.1122953228.squirrel@68.49.173.177> Hello, As part of our research at UMBC, we are studying the characteristics of software development in the bioinformatics domain. We believe that this study should be guided by the people who are actively involved in bioinformatics. This research is our first step towards enabling the production of high quality bioinformatics software with less time and effort. Therefore, your feedback is very important to us. We seek your input in the form of a survey questionnaire that will take around 15 minutes of your time. We solicit general demographic information, information about the products that you have developed, your work practices, and your software development process. So, if you are a bioinformatics professional doing software development or a software developer working in the bioinformatics domain, please provide us with your valuable input. We assure you that this information will be used only for academic purposes and will be completely confidential. Please follow the link below to start the survey: http://www.is.umbc.edu/bio-survey/ We appreciate your participation in advance. Regards, HASE (Human Aspects of Software Engineering) 1000 Hilltop Circle Department of Information Systems University of Maryland Baltimore County Baltimore, MD, 21250 hase@umbc.edu From walsh at cenix-bioscience.com Tue Aug 2 03:04:27 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Tue Aug 2 02:54:44 2005 Subject: [Bioperl-l] Patching lucy In-Reply-To: <42EE7A5B.6050701@purdue.edu> References: <42EA40F5.3090707@purdue.edu> <42EE1D8D.2070708@cenix-bioscience.com> <42EE7A5B.6050701@purdue.edu> Message-ID: <42EF1AFB.5010001@cenix-bioscience.com> Hi Phillip, I ran the patch on version 1.19p (which I downloaded from the TIGR ftp site yesterday). It seemed to work for me (all 7 patches worked). > patch -b -i lucy.patch lucy.c patching file lucy.c Here are the contents of the patch file. Perhaps my mail client did something funny in formatting this. I'll send you a separate file as an attachment as well. > cat lucy.patch 277a278,279 > /* AGW added next line */ > fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name); 588c590,592 < if ((seqs[i].len=bases)<=0) --- > if ((seqs[i].len=bases)<=0) { > /* AGW added next line */ > fprintf(stderr, "Empty: %s\n", seqs[i].name); 589a594 > } 893c898,902 < if (left) seqs[i].left+=left; --- > if (left) { > seqs[i].left+=left; > /* AGW added next line */ > fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); > } 896c905,909 < if (right) seqs[i].right-=right; --- > if (right) { > seqs[i].right-=right; > /* AGW added next line */ > fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); > } 898a912,913 > /* AGW added next line */ > fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name); 949a965,966 > /* AGW added next line */ > fprintf(stderr, "Vector: %s\n", seqs[i].name); Cheers, Andrew Phillip San Miguel wrote: > Hi Andrew, > > Thanks for the effort you went to here. Still looks there is a (more > minor) problem though. > patch gives a few errors (see below) using your new diff. Looks like 2 > of the 7 patches failed to patch lucy.c from lucy version lucy-1.19p. > > But the resulting source code does compile and run on the lucy test > data. But the PolyA patches did not get inserted. > > Do you know if all 7 of your patches were installed into the lucy.c file > from lucy-1.19p? > > (By the way, I think we are on the same page. I do understand that your > perl code parses lucy output. I've tried it on lucy 1.19p output and it > succeeds--although it, of course, lacks some of the functionality that > would be available from the patched version of lucy). > > Phillip > > Here is the output when I run patch: > > (lucy)% cd lucy-1.19p > (lucy-1.19p)% patch -b -i AndrewsNewPatch.diff lucy.c > Looks like a normal diff. > Hunk #4 failed at line 893. > Hunk #5 failed at line 896. > 2 out of 7 hunks failed: saving rejects to lucy.c.rej > I can't seem to find a patch in there anywhere. > > Here is the lucy.c.rej file contents: > > *************** > *** 893,893 **** > ! if (left) seqs[i].left+=left; > --- 898,902 ---- > ! if (left) { > ! seqs[i].left+=left; > ! /* AGW added next line */ > ! fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); > ! } > *************** > *** 896,896 **** > ! if (right) seqs[i].right-=right; > --- 905,909 ---- > ! if (right) { > ! seqs[i].right-=right; > ! /* AGW added next line */ > ! fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); > ! } > > > Andrew Walsh wrote: > >> Hi Phillip, >> >> The patch pasted at the bottom of this e-mail should do the trick. >> When you say that lucy seg faults, I assume you mean that you get the >> segfault when running lucy on its own. The module itself does not >> call lucy. It is only parsing the output from the files that lucy >> creates. lucy itself should be taking phred files as its input. The >> patch is required if you want to use the stderr from the lucy to get >> more information from the module about the sequences. If you apply >> this patch, you can try running the test that comes with the lucy >> tarball (see the README.FIRST file in the distribution). It works for >> me (Suse 9.0 on a Pentium 3 box). Let me know if there are any >> problems. I will update the Appendix for Bio::Tools::Lucy in CVS. >> >> Cheers, >> >> Andrew >> >> >> 277a278,279 >> > /* AGW added next line */ >> > fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name); >> 588c590,592 >> < if ((seqs[i].len=bases)<=0) >> --- >> > if ((seqs[i].len=bases)<=0) { >> > /* AGW added next line */ >> > fprintf(stderr, "Empty: %s\n", seqs[i].name); >> 589a594 >> > } >> 893c898,902 >> < if (left) seqs[i].left+=left; >> --- >> > if (left) { >> > seqs[i].left+=left; >> > /* AGW added next line */ >> > fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); >> > } >> 896c905,909 >> < if (right) seqs[i].right-=right; >> --- >> > if (right) { >> > seqs[i].right-=right; >> > /* AGW added next line */ >> > fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); >> > } >> 898a912,913 >> > /* AGW added next line */ >> > fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name); >> 949a965,966 >> > /* AGW added next line */ >> > fprintf(stderr, "Vector: %s\n", seqs[i].name); >> >> >> >> >> Phillip SanMiguel wrote: >> >>> The patch to lucy source code from (the appendix): >>> >>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html >>> >>> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs >>> fine, but the resulting executable (after make) seg faults when run >>> on the lucy test data. >>> >>> Any advice? >>> >>> I've sent email directly to the module creator, Andrew G. Walsh, as >>> requested in the module. But I'm not sure if the module creator >>> regularly monitors the hotmail account listed therein. So I thought >>> I'd post here, in case someone had a patch that would work on lucy-1.19. >>> >> >> > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From pmiguel at purdue.edu Tue Aug 2 10:48:48 2005 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Tue Aug 2 10:40:17 2005 Subject: [Bioperl-l] Patching lucy In-Reply-To: <42EF1AFB.5010001@cenix-bioscience.com> References: <42EA40F5.3090707@purdue.edu> <42EE1D8D.2070708@cenix-bioscience.com> <42EE7A5B.6050701@purdue.edu> <42EF1AFB.5010001@cenix-bioscience.com> Message-ID: <42EF87D0.50104@purdue.edu> Andrew, Yes you are right. Everything looks good now. A good test (of lucy) was to take the suggested lucy test from README.FIRST and add the "-c" parameter to it after patching the source and compiling. The test would be: lucy -c -v PUC19 PUC19splice atie.seq atie.qul atie.2nd -debug lucy.info The the output to STDERR shows all the extra information your patches have caused lucy to include. Thanks! Phillip Andrew Walsh wrote: > Hi Phillip, > > I ran the patch on version 1.19p (which I downloaded from the TIGR ftp > site yesterday). It seemed to work for me (all 7 patches worked). > > > patch -b -i lucy.patch lucy.c > patching file lucy.c > > Here are the contents of the patch file. Perhaps my mail client did > something funny in formatting this. I'll send you a separate file as > an attachment as well. > > > cat lucy.patch > 277a278,279 > > /* AGW added next line */ > > fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name); > 588c590,592 > < if ((seqs[i].len=bases)<=0) > --- > > if ((seqs[i].len=bases)<=0) { > > /* AGW added next line */ > > fprintf(stderr, "Empty: %s\n", seqs[i].name); > 589a594 > > } > 893c898,902 > < if (left) seqs[i].left+=left; > --- > > if (left) { > > seqs[i].left+=left; > > /* AGW added next line */ > > fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); > > } > 896c905,909 > < if (right) seqs[i].right-=right; > --- > > if (right) { > > seqs[i].right-=right; > > /* AGW added next line */ > > fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); > > } > 898a912,913 > > /* AGW added next line */ > > fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name); > 949a965,966 > > /* AGW added next line */ > > fprintf(stderr, "Vector: %s\n", seqs[i].name); > > > Cheers, > > Andrew > > > Phillip San Miguel wrote: > >> Hi Andrew, >> >> Thanks for the effort you went to here. Still looks there is a (more >> minor) problem though. >> patch gives a few errors (see below) using your new diff. Looks like >> 2 of the 7 patches failed to patch lucy.c from lucy version lucy-1.19p. >> >> But the resulting source code does compile and run on the lucy test >> data. But the PolyA patches did not get inserted. >> >> Do you know if all 7 of your patches were installed into the lucy.c >> file from lucy-1.19p? >> >> (By the way, I think we are on the same page. I do understand that >> your perl code parses lucy output. I've tried it on lucy 1.19p output >> and it succeeds--although it, of course, lacks some of the >> functionality that would be available from the patched version of lucy). >> >> Phillip >> >> Here is the output when I run patch: >> >> (lucy)% cd lucy-1.19p >> (lucy-1.19p)% patch -b -i AndrewsNewPatch.diff lucy.c >> Looks like a normal diff. >> Hunk #4 failed at line 893. >> Hunk #5 failed at line 896. >> 2 out of 7 hunks failed: saving rejects to lucy.c.rej >> I can't seem to find a patch in there anywhere. >> >> Here is the lucy.c.rej file contents: >> >> *************** >> *** 893,893 **** >> ! if (left) seqs[i].left+=left; >> --- 898,902 ---- >> ! if (left) { >> ! seqs[i].left+=left; >> ! /* AGW added next line */ >> ! fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); >> ! } >> *************** >> *** 896,896 **** >> ! if (right) seqs[i].right-=right; >> --- 905,909 ---- >> ! if (right) { >> ! seqs[i].right-=right; >> ! /* AGW added next line */ >> ! fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); >> ! } >> >> >> Andrew Walsh wrote: >> >>> Hi Phillip, >>> >>> The patch pasted at the bottom of this e-mail should do the trick. >>> When you say that lucy seg faults, I assume you mean that you get >>> the segfault when running lucy on its own. The module itself does >>> not call lucy. It is only parsing the output from the files that >>> lucy creates. lucy itself should be taking phred files as its >>> input. The patch is required if you want to use the stderr from the >>> lucy to get more information from the module about the sequences. >>> If you apply this patch, you can try running the test that comes >>> with the lucy tarball (see the README.FIRST file in the >>> distribution). It works for me (Suse 9.0 on a Pentium 3 box). Let >>> me know if there are any problems. I will update the Appendix for >>> Bio::Tools::Lucy in CVS. >>> >>> Cheers, >>> >>> Andrew >>> >>> >>> 277a278,279 >>> > /* AGW added next line */ >>> > fprintf(stderr, "Short/ no insert: %s\n", seqs[i].name); >>> 588c590,592 >>> < if ((seqs[i].len=bases)<=0) >>> --- >>> > if ((seqs[i].len=bases)<=0) { >>> > /* AGW added next line */ >>> > fprintf(stderr, "Empty: %s\n", seqs[i].name); >>> 589a594 >>> > } >>> 893c898,902 >>> < if (left) seqs[i].left+=left; >>> --- >>> > if (left) { >>> > seqs[i].left+=left; >>> > /* AGW added next line */ >>> > fprintf(stderr, "%s has PolyA (left).\n", seqs[i].name); >>> > } >>> 896c905,909 >>> < if (right) seqs[i].right-=right; >>> --- >>> > if (right) { >>> > seqs[i].right-=right; >>> > /* AGW added next line */ >>> > fprintf(stderr, "%s has PolyA (right).\n", seqs[i].name); >>> > } >>> 898a912,913 >>> > /* AGW added next line */ >>> > fprintf(stderr, "Dropped PolyA: %s\n", seqs[i].name); >>> 949a965,966 >>> > /* AGW added next line */ >>> > fprintf(stderr, "Vector: %s\n", seqs[i].name); >>> >>> >>> >>> >>> Phillip SanMiguel wrote: >>> >>>> The patch to lucy source code from (the appendix): >>>> >>>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html >>>> >>>> seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs >>>> fine, but the resulting executable (after make) seg faults when run >>>> on the lucy test data. >>>> >>>> Any advice? >>>> >>>> I've sent email directly to the module creator, Andrew G. Walsh, as >>>> requested in the module. But I'm not sure if the module creator >>>> regularly monitors the hotmail account listed therein. So I thought >>>> I'd post here, in case someone had a patch that would work on >>>> lucy-1.19. >>>> >>> >>> >> > > From pmiguel at purdue.edu Tue Aug 2 16:34:14 2005 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Tue Aug 2 16:26:58 2005 Subject: [Bioperl-l] ABI average singal intensity In-Reply-To: <5f2c7e5efe93.5efe935f2c7e@uidaho.edu> References: <5f2c7e5efe93.5efe935f2c7e@uidaho.edu> Message-ID: <42EFD8C6.7060609@purdue.edu> Hi Xiaojun, Here is a perl one-liner that will give you the mean signal strengths from a .ab1 (or, probably a .abi) file: perl -e 'undef $/; $trace=<>; ($sigptr)=$trace =~ m{S/N%.{16}(.{4})}s;\ ($fwo)=$trace =~ /FWO_.{16}(.{4})/s;print "The signal strengths for the bases: "\ ,$fwo," are: ",join(" ",unpack("n*",substr($trace,unpack("N*",$sigptr),8))),"\n"' test.ab1 In the case of an ab1 file I have, I get the output: The signal strengths for the bases: GATC are: 2710 4749 4034 3588 Copy and paste to the command line of the machine where you have the trace file--replace "test.ab1 with the actual name of your trace file of interest. In the unlikely case that your machine is a VAX (or some other "little endian" machine) you will have to use "v*" and "V*" for unpacking... I wrote the one-liner based of Clark Tibbett's paper about ABI file format: http://www.cs.cmu.edu/afs/cs/project/genome/WWW/Papers/clark.html A few words of caution: while the S/N% tag looks like it should give a "signal"/"noise" % , I'm not sure that it does exactly that. Nevertheless this is usually what is meant when one asks for the average "signal strength" of a chromat. In addition, this is the "S/N%" of *processed* chromatographic data. There is quite a bit of normalization and background correction that goes on to produce the processed data from the raw data. Phillip SanMiguel Purdue Genomics Core Facility Xiaojun Hu wrote: >Hi, > >Does anyone know how to get the (A T C G)average >singal intensity from ABI file? > >Thank you very much! > >Xiaojun Hu > > From horkko at gmail.com Tue Aug 2 10:11:08 2005 From: horkko at gmail.com (Emmanuel QUEVILLON) Date: Tue Aug 2 16:43:29 2005 Subject: [Bioperl-l] Bug in Bio::SeqFeature::Annotated ? Message-ID: <5e8d03d50508020711e543b8b@mail.gmail.com> Dears, I tried to play with BioPerl to produce GFF3 output files. It works alright when I use Bio::SeqFeature::Generic and Bio::Tools::GFF but was more complex and longer when I tried to use Bio::SeqFeature::Annotated and Bio::FeatureIO. Actually there are two problems with Bio::SeqFeature::Annotated 1) A bug in the '_initialize' method: sub _initialize { my ($self,@args) = @_; my ( $start, $end, $strand, $frame, $phase, $score, $name, $id, $annot, $location, <=== here $id shouldn't be here $display_name, #deprecate $seq_id, $type,$source ) = $self->_rearrange([qw(START END STRAND FRAME PHASE SCORE NAME ANNOTATION LOCATION DISPLAY_NAME SEQ_ID TYPE SOURCE )], @args); defined $start && $self->start($start); defined $end && $self->end($end); defined $strand && $self->strand($strand); defined $frame && $self->frame($frame); defined $phase && $self->phase($phase); defined $score && $self->score($score); defined $source && $self->source($source); defined $type && $self->type($type); defined $location && $self->location($location); defined $annot && $self->annotation($annot); $id causes a shift in the values when they are rearranged. Then, for example, $id = (value of $annot) and $annot = (value of $location) and so on. So it would be nice if it could be corrected (removed). This bug is still in the BioPerl live. 2) It is not possible to set a correct type when you create you Bio::SeqFeature::Annotated object. Actually it is correctly set when the object is created, but when you pass this object to Bio::FeatureIO::write_feature, suddenly the value is undefined and the gff3 output contains the default value which is 'region'. I tried to debug this problem but I did not find a way to solve it. Maybe I miss some knowledges about Perl! ? 3) Also it could be nice it a test could be done on the presence or not of an annot object. If you follow the structure of the _initialize method below, you can see that start, end, frame, phase. source etc.. are set before the call of sub annotation. When these subroutines are called, a Bio::Annotation::Collection is created and set in memory. Then when annotation sub is called, this previous Collection object is overwriten with $annot. So the idea would be to install a test to throw or warn an error to the user for example when a Collection object is passed to the new method to avoid the overwriten. that's all :). I hope these remarks will be usefull. If not, sorry to bother the list. Regards Emmanuel -- Emmanuel Quevillon email: horkko at gmail.com blog: http://horkko.blogspot.com From Guido.Dieterich at gbf.de Tue Aug 2 16:49:01 2005 From: Guido.Dieterich at gbf.de (Guido Dieterich) Date: Tue Aug 2 16:43:30 2005 Subject: [Bioperl-l] parse genbank file Message-ID: <1123015741.13213.94.camel@sb289.gbf-braunschweig.de> Hi, I want to parse a genbank file (Listeria Innocua)! this is a part of the code ... my $file = "NC_003212.gbk"; my $stream = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); while( my $seq = $stream->next_seq ) { print $seq->display_id; } output: NC_003212 I just get the NC ID for this file, but not for the genes within ... ????? Greetings Guido From walsh at cenix-bioscience.com Wed Aug 3 03:13:47 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Wed Aug 3 03:04:29 2005 Subject: [Bioperl-l] parse genbank file In-Reply-To: <1123015741.13213.94.camel@sb289.gbf-braunschweig.de> References: <1123015741.13213.94.camel@sb289.gbf-braunschweig.de> Message-ID: <42F06EAB.6010503@cenix-bioscience.com> Hello, There is only 1 'sequence' in the file (namely, NC_003212). The genes are actually features on the sequence. So, you would have to get the 'gene' sequence features for the sequence. e.g. my $gene_seq_feats = get_list_seq_feats_by_primary_tag($seq_obj, 'gene'); sub get_list_seq_feats_by_primary_tag { my ($seq_obj, $tag) = @_; ref $seq_obj or confess "Seq obj not defined!"; my @features = $seq_obj->top_SeqFeatures(); my @list = (); for my $feat (@features) { if ($feat->primary_tag eq $tag) { push @list, $feat; } } return \@list } HTH, Andrew Guido Dieterich wrote: > Hi, > > I want to parse a genbank file (Listeria Innocua)! > > this is a part of the code ... > > > my $file = "NC_003212.gbk"; > > my $stream = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); > > while( my $seq = $stream->next_seq ) { > > print $seq->display_id; > > } > > > > > output: > > NC_003212 > > I just get the NC ID for this file, but not for the genes within ... > > > ????? > > Greetings > > Guido > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From letondal at pasteur.fr Wed Aug 3 09:00:30 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed Aug 3 08:50:27 2005 Subject: [Bioperl-l] Bio::Tools::Run::PiseApplication : parameters changes in seqgen Message-ID: Hi, The Pise/bioperl interface of the seq-gen program (http://bioweb.pasteur.fr/seqanal/interfaces/seqgen-simple.html) has changed. We have added new parameters, modified one and changed some parameters' type from Integer to Float: Changes in parameters: - added options invar_site (-i), random_seed (-z), write-ancest (-wa), write-sites (-wr), partition_numb (-p) - fixed bug in Phylip option: it was -p now it's a vlist: -op, -or, -on - changed type of some options type from integer to float (scale_branch, scale_tree, rate123, shape, freqACGT, transratio) Please tell us if there is any trouble. Best, -- Catherine Letondal -- Institut Pasteur From avilella at gmail.com Wed Aug 3 11:00:04 2005 From: avilella at gmail.com (Albert Vilella) Date: Wed Aug 3 10:50:44 2005 Subject: [Bioperl-l] Bio::Tools::Run prepare executions [was Re:bioperl-run Codeml.pm fix_blength] In-Reply-To: <1121184178.8167.28.camel@localhost.localdomain> References: <1121181586.8167.13.camel@localhost.localdomain> <1121182841.8167.22.camel@localhost.localdomain> <1121184178.8167.28.camel@localhost.localdomain> Message-ID: <1123081204.10112.2.camel@localhost.localdomain> Hi all, Having thought about the previous thread on changing tempdir as a settable value in Bio::Tools::Run::WrapperBase (Jason? should we?)... ...I wonder if it may be interesting (at least it would for me) to have something like a "prepare" method for the execution wrappers in Bio::Tools::Run. What I'm looking for is a way to create the dirs corresponding to the analysis one wants to conduct. The "prepare" method would create, but not execute, the dir with the ready-to-run elements of the executables according to the various input data files and parameters. Right now, we have a "run" method that first prepares the elements needed for the execution and then runs the program. We also have container objects for program results in bioperl-live. This "prepare" method might be useful for people wanting to generate sets of analysis for further execution on queueing-based systems or similar scheduled execution situations. I agree that the sole "preparation" of an execution it might not fit well with the idea of an execution wrapper as it is now in bioperl, so any suggestions/comments/criticism are welcome. Bests, Albert. El dt 12 de 07 del 2005 a les 18:03 +0200, en/na Albert Vilella va escriure: > El dt 12 de 07 del 2005 a les 11:47 -0400, en/na Jason Stajich va > escriure: > > Sounds good - would you just copy the dir to the users specified > > outdir? > > yes > > > Another way to go is make tempdir a settable value (see > > Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but > > this may not be as clear on how to use it? > > well, it is not as direct as the other way but maybe it is cleaner in > the sense that will directly run the analysis on $tempdir and no extra > cp or mv would be needed... > > Albert. > > > > > > > -jason > > On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote: > > > > > El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va > > > escriure: > > > > > > > sure - fix away. > > > > > > > > > > > > > done. > > > > > > > > > Also, in my pipeline it would be interesting to call Codeml.pm via > > > bioperl keeping the tempfiles in a specified directory: > > > > > > > > > I understand that save_tempfiles will save the generated tempfiles > > > in > > > the temp directory, the dir will remain in $tempdir. > > > An $outdir could be specified so that the codeml run is saved where > > > the > > > user specifies. > > > > > > > > > What do you think? > > > > > > > > > Albert. > > > > > > > > > > > > > > > > > > > -- > > > > Jason Stajich > > > > jason.stajich at duke.edu > > > > http://www.duke.edu/~jes12/ > > > > > > > > > > From anunberg at oriongenomics.com Wed Aug 3 11:40:21 2005 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Wed Aug 3 11:30:32 2005 Subject: [Bioperl-l] Error making query to Bio::DB:GFF db Message-ID: I have a Bio::DB:GFF db of the human genome. When querying a particular chromosome I consistenly get the following error when attempting to create a segment ------------- EXCEPTION ------------- MSG: Couldn't execute query SELECT fref, IF(ISNULL(gclass),'Sequence',gclass), min(fstart), max(fstop), fstrand, gname FROM fdata,fgroup WHERE fgroup.gname=? AND fgroup.gclass=? AND fgroup.gid=fdata.gid GROUP BY fref,fstrand,gname : MySQL server has gone away What does this mean the server has gone away?? -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From gdw1 at cornell.edu Wed Aug 3 12:00:13 2005 From: gdw1 at cornell.edu (Gregory Drake Wilson) Date: Wed Aug 3 11:50:35 2005 Subject: [Bioperl-l] bl2seq and next_aln() Message-ID: <1343.129.44.235.147.1123084813.squirrel@129.44.235.147> I am trying to parse a bl2seq file but am only being returned one of the alignments when there are 2+. Code: my @params = (program => 'blastn' , 'outfile' => 'bl2seq.out'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $report = $factory->bl2seq($seq1, $seq2); my $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq'); while ( my $aln = $str->next_aln() ) { print $aln->consensus_iupac()."\n"; } Opening 'bl2seq.out' shows mutiple alignments, yet this code only returns the first one in the file. Any thoughts? Greg From jason.stajich at duke.edu Wed Aug 3 13:43:33 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 3 13:33:42 2005 Subject: [Bioperl-l] bl2seq and next_aln() In-Reply-To: <1343.129.44.235.147.1123084813.squirrel@129.44.235.147> References: <1343.129.44.235.147.1123084813.squirrel@129.44.235.147> Message-ID: <1123091013.42f10245c0e8a@webmail.duke.edu> Not sure - could be bug in AlignIO::bl2seq -- although it just uses SearchIO... But could also be silly file sync problem in that filehandle is not closed (although this is also unlikely as the output it written to file by bl2seq). So not sure - does it only show the 1st alnment? Personally I would use the report object to get the aln directly. if( my $r = $report->next_result ) { while( my $hit = $r->next_hit ) { while( my $hsp = $hit->next_hsp ) { print $hsp->get_aln->consensus_iupac()."\n"; } } } -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ Quoting Gregory Drake Wilson : > I am trying to parse a bl2seq file but am only being returned one of the > alignments when there are 2+. > Code: > my @params = (program => 'blastn' , 'outfile' => 'bl2seq.out'); > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $report = $factory->bl2seq($seq1, $seq2); > > my $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => > 'bl2seq'); > > while ( my $aln = $str->next_aln() ) { > print $aln->consensus_iupac()."\n"; > } > > Opening 'bl2seq.out' shows mutiple alignments, yet this code only returns > the first one in the file. Any thoughts? > > Greg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From radev at umich.edu Wed Aug 3 17:58:16 2005 From: radev at umich.edu (radev@umich.edu) Date: Wed Aug 3 17:48:21 2005 Subject: [Bioperl-l] newbie Q - missing Tools/HMM.pm ? In-Reply-To: from "bioperl-l-bounces@portal.open-bio.org" at Aug 03, 2005 05:46:11 PM Message-ID: <20050803215816.64727B848B@tangra.si.umich.edu> Hi, I just installed Bundle::BioPerl via CPAN. I am now trying to run the code in http://doc.bioperl.org/bioperl-live/Bio/Tools/HMM.html but for some reason Tools/HMM.pm didn't get installed with the rest of the code. Neither did SeqIO.pm . What did I miss? Thanks! Drago From sdavis2 at mail.nih.gov Wed Aug 3 18:19:28 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Aug 3 18:10:14 2005 Subject: [Bioperl-l] newbie Q - missing Tools/HMM.pm ? In-Reply-To: <20050803215816.64727B848B@tangra.si.umich.edu> Message-ID: On 8/3/05 5:58 PM, "radev@umich.edu" wrote: > Hi, > > I just installed Bundle::BioPerl via CPAN. I am now trying to run the code > in http://doc.bioperl.org/bioperl-live/Bio/Tools/HMM.html > > but for some reason Tools/HMM.pm didn't get installed with the rest of > the code. Neither did SeqIO.pm . > > What did I miss? Bundle::Bioperl only installs needed CPAN modules for bioperl. It doesn't install bioperl at all. You will now need to install bioperl. Sean From jason.stajich at duke.edu Wed Aug 3 20:52:48 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 3 20:43:26 2005 Subject: [Bioperl-l] newbie Q - missing Tools/HMM.pm ? In-Reply-To: <20050803215816.64727B848B@tangra.si.umich.edu> References: <20050803215816.64727B848B@tangra.si.umich.edu> Message-ID: <786C6E50-63BA-4EA3-A20E-EF3F36BF2A87@duke.edu> Besides Sean's point that the Bundle doesn't install Bioperl itsself, this module is only in bioperl-live CVS and not in the 1.4 release that is on CPAN. See the bioperl website for how to get the CVS code. You can also browse daily(at least) CVS checkouts here http://bioperl.org/SRC/bioperl-live -jason On Aug 3, 2005, at 5:58 PM, radev@umich.edu wrote: > Hi, > > I just installed Bundle::BioPerl via CPAN. I am now trying to run > the code > in http://doc.bioperl.org/bioperl-live/Bio/Tools/HMM.html > > but for some reason Tools/HMM.pm didn't get installed with the rest of > the code. Neither did SeqIO.pm . > > What did I miss? > > Thanks! > > Drago > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From allenday at ucla.edu Wed Aug 3 21:08:39 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Aug 3 20:58:43 2005 Subject: [Bioperl-l] darwin PERL5LIB ignored Message-ID: This is an off-topic question for the list, but I know there are lot of mac users here, and I'm hoping for a quick fix. I'm having problems getting my bash environment to recognize my $PERL5LIB variable. Even if I declare the variable in my .bashrc file and source it, the variable is ignored until I explicitly export it from my session prompt. Below is a dialog illustrating the problem. Anyone know of a workaround to get perl to use $PERL5LIB as declared in the .bashrc file, as opposed to requiring an explicit export of the variable? Thanks. -Allen ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #variable appears to be set up correctly buildmac:~ allenday$ echo $PERL5LIB /net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl #but it doesn't appear in @INC buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 . #the variable is defined in my .bashrc file, which is evaluated at login buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl #just to make sure, sourcing the .bashrc file has no effect on @INC buildmac:~ allenday$ source ~/.bashrc buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 . #I have to explicitly export from the prompt to affect @INC buildmac:~ allenday$ export PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' /net/groove/lib/perl5/site_perl /usr/local/lib/perl5/site_perl /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 . ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From taerwin at tpg.com.au Wed Aug 3 21:37:28 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Wed Aug 3 21:32:33 2005 Subject: [Bioperl-l] darwin PERL5LIB ignored In-Reply-To: References: Message-ID: <1123119449.11338.3.camel@bacp4> > #the variable is defined in my .bashrc file, which is evaluated at login > buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB > PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl You should export the variable from your .bashrc export PERL5LIB=/net/groove/lib/perl5/site_perl:/other_dirs If you don't export it it will only get set for the current shell and wont be transfer to other shells. Regards, Tim On Wed, 2005-08-03 at 18:08 -0700, Allen Day wrote: > This is an off-topic question for the list, but I know there are lot of > mac users here, and I'm hoping for a quick fix. > > I'm having problems getting my bash environment to recognize my $PERL5LIB > variable. Even if I declare the variable in my .bashrc file and source > it, the variable is ignored until I explicitly export it from my session > prompt. Below is a dialog illustrating the problem. > > Anyone know of a workaround to get perl to use $PERL5LIB as declared in > the .bashrc file, as opposed to requiring an explicit export of the > variable? > > Thanks. > -Allen > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > #variable appears to be set up correctly > buildmac:~ allenday$ echo $PERL5LIB > /net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > > #but it doesn't appear in @INC > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level > /Library/Perl/5.8.6 > /Library/Perl > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 > /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 > /Library/Perl/5.8.1 > . > > #the variable is defined in my .bashrc file, which is evaluated at login > buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB > PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > > #just to make sure, sourcing the .bashrc file has no effect on @INC > buildmac:~ allenday$ source ~/.bashrc > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level > /Library/Perl/5.8.6 > /Library/Perl > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 > /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 > /Library/Perl/5.8.1 > . > > #I have to explicitly export from the prompt to affect @INC > buildmac:~ allenday$ export PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' > /net/groove/lib/perl5/site_perl > /usr/local/lib/perl5/site_perl > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level > /Library/Perl/5.8.6 > /Library/Perl > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 > /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 > /Library/Perl/5.8.1 > . > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From allenday at ucla.edu Wed Aug 3 23:00:19 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Aug 3 22:50:33 2005 Subject: bioperl rpms via yum on Darwin (Was: [Bioperl-l] darwin PERL5LIB ignored0 In-Reply-To: <1123119449.11338.3.camel@bacp4> References: <1123119449.11338.3.camel@bacp4> Message-ID: yep, that did it. i don't know why i didn't prefix export onto that line as will all the others in my .bashrc file. thanks a bunch. btw, the reason i'm doing this is b/c i'm in the middle of porting the biopackages rpm repository to be installable on darwin. i just finished porting rpm and yum yesterday -- should be able to have bioperl installable via rpm within a week, barring no c library dependency problems. -allen On Thu, 4 Aug 2005, Tim Erwin wrote: > > #the variable is defined in my .bashrc file, which is evaluated at login > > buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB > > PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > > You should export the variable from your .bashrc > > export PERL5LIB=/net/groove/lib/perl5/site_perl:/other_dirs > > If you don't export it it will only get set for the current shell and > wont be transfer to other shells. > > Regards, > > Tim > > On Wed, 2005-08-03 at 18:08 -0700, Allen Day wrote: > > This is an off-topic question for the list, but I know there are lot of > > mac users here, and I'm hoping for a quick fix. > > > > I'm having problems getting my bash environment to recognize my $PERL5LIB > > variable. Even if I declare the variable in my .bashrc file and source > > it, the variable is ignored until I explicitly export it from my session > > prompt. Below is a dialog illustrating the problem. > > > > Anyone know of a workaround to get perl to use $PERL5LIB as declared in > > the .bashrc file, as opposed to requiring an explicit export of the > > variable? > > > > Thanks. > > -Allen > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > #variable appears to be set up correctly > > buildmac:~ allenday$ echo $PERL5LIB > > /net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > > > > #but it doesn't appear in @INC > > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level > > /Library/Perl/5.8.6 > > /Library/Perl > > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 > > /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 > > /Library/Perl/5.8.1 > > . > > > > #the variable is defined in my .bashrc file, which is evaluated at login > > buildmac:~ allenday$ cat ~/.bashrc | grep PERL5LIB > > PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > > > > #just to make sure, sourcing the .bashrc file has no effect on @INC > > buildmac:~ allenday$ source ~/.bashrc > > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level > > /Library/Perl/5.8.6 > > /Library/Perl > > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 > > /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 > > /Library/Perl/5.8.1 > > . > > > > #I have to explicitly export from the prompt to affect @INC > > buildmac:~ allenday$ export PERL5LIB=/net/groove/lib/perl5/site_perl:/usr/local/lib/perl5/site_perl > > buildmac:~ allenday$ perl -e 'print join "\n",@INC,"\n"' > > /net/groove/lib/perl5/site_perl > > /usr/local/lib/perl5/site_perl > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level > > /Library/Perl/5.8.6 > > /Library/Perl > > /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 > > /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 > > /Library/Perl/5.8.1 > > . > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From ureddi at emich.edu Thu Aug 4 09:42:19 2005 From: ureddi at emich.edu (Usha Rani Reddi) Date: Thu Aug 4 09:34:10 2005 Subject: [Bioperl-l] bl2seq Message-ID: <655276655c36.655c36655276@emich.edu> Hi, I tried to run local bl2seq by installing Bioperl on Linux machine. When I tried to align 2 sequences using bl2seq I got an error message that says "could not find path to bl2seq". After getting the error message I did set the environmental variables(path) and tried again I got the same error message. Please help me with this. Thanks Usha From jason.stajich at duke.edu Thu Aug 4 14:20:13 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Aug 4 14:11:15 2005 Subject: [Bioperl-l] Re: Bio::Tools::Run prepare executions [was Re:bioperl-run Codeml.pm fix_blength] In-Reply-To: <1123081204.10112.2.camel@localhost.localdomain> References: <1121181586.8167.13.camel@localhost.localdomain> <1121182841.8167.22.camel@localhost.localdomain> <1121184178.8167.28.camel@localhost.localdomain> <1123081204.10112.2.camel@localhost.localdomain> Message-ID: On Aug 3, 2005, at 11:00 AM, Albert Vilella wrote: > Hi all, > > Having thought about the previous thread on changing tempdir as a > settable value in Bio::Tools::Run::WrapperBase (Jason? should we?)... > i think it will be fine to do those changes if I remember correctly what they were... =) > ...I wonder if it may be interesting (at least it would for me) to > have something like a "prepare" method for the execution wrappers in > Bio::Tools::Run. > > What I'm looking for is a way to create the dirs corresponding to the > analysis one wants to conduct. The "prepare" method would create, but > not execute, the dir with the ready-to-run elements of the executables > according to the various input data files and parameters. > > Right now, we have a "run" method that first prepares the elements > needed for the execution and then runs the program. > > We also have container objects for program results in bioperl-live. > > This "prepare" method might be useful for people wanting to generate > sets of analysis for further execution on queueing-based systems or > similar scheduled execution situations. > Sure - this sounds fine- I guess part of the prepare step, though is preparing the arguments to send to the programs. Do you want to capture these arguments as well? My understanding is the BioPipe system (which may not have many devs now) tried make this possible by encoding the input options to the Perl modules in an XML file which was loaded into the pipeline db. http://www.genome.org/cgi/content/abstract/13/8/1904 But I'm definitely open to some other ideas about how this should be done and the idea of a prepare step seems great (especially if we break out a cleanup step as well and insure that every run cmd does a prepare, execute, cleanup cycle. Thanks for jumping in on this - I think your ideas and intuition here are right on the mark and I think a more systematic approach on the parts needed to run an external program should be spelled out in the code. -jason > I agree that the sole "preparation" of an execution it might not fit > well with the idea of an execution wrapper as it is now in bioperl, so > any suggestions/comments/criticism are welcome. > > Bests, > > Albert. > > > > El dt 12 de 07 del 2005 a les 18:03 +0200, en/na Albert Vilella va > escriure: > >> El dt 12 de 07 del 2005 a les 11:47 -0400, en/na Jason Stajich va >> escriure: >> >>> Sounds good - would you just copy the dir to the users specified >>> outdir? >>> >> >> yes >> >> >>> Another way to go is make tempdir a settable value (see >>> Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but >>> this may not be as clear on how to use it? >>> >> >> well, it is not as direct as the other way but maybe it is cleaner in >> the sense that will directly run the analysis on $tempdir and no >> extra >> cp or mv would be needed... >> >> Albert. >> >> >>> >>> >>> -jason >>> On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote: >>> >>> >>>> El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va >>>> escriure: >>>> >>>> >>>>> sure - fix away. >>>>> >>>>> >>>> >>>> >>>> done. >>>> >>>> >>>> Also, in my pipeline it would be interesting to call Codeml.pm via >>>> bioperl keeping the tempfiles in a specified directory: >>>> >>>> >>>> I understand that save_tempfiles will save the generated tempfiles >>>> in >>>> the temp directory, the dir will remain in $tempdir. >>>> An $outdir could be specified so that the codeml run is saved where >>>> the >>>> user specifies. >>>> >>>> >>>> What do you think? >>>> >>>> >>>> Albert. >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> >>> Jason Stajich >>> >>> jason.stajich at duke.edu >>> >>> http://www.duke.edu/~jes12/ >>> >>> >>> >>> >>> >>> > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From ushashankar2000 at yahoo.com Thu Aug 4 09:35:29 2005 From: ushashankar2000 at yahoo.com (Usha) Date: Thu Aug 4 15:40:22 2005 Subject: [Bioperl-l] bl2seq Message-ID: <20050804133529.47171.qmail@web34314.mail.mud.yahoo.com> Hi, I tried to run local bl2seq by installing Bioperl on Linux machine. When I tried to align 2 sequences using bl2seq I got an error message that says "could not find path to bl2seq". After getting the error message I did set the eenvironmental variables(path) and tried again I got the same error message. Please help me with this. Thanks Usha __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mconte at cirad.fr Thu Aug 4 11:29:20 2005 From: mconte at cirad.fr (matthieu) Date: Thu Aug 4 15:40:23 2005 Subject: [Bioperl-l] dividing seqboot outfiles Message-ID: <42F23450.70209@cirad.fr> Hello, I'm trying to divide seqboot outfiles containing 100 multialignments in , for example, 10 files of 10 multialignments. I did'nt find any parser for this. I'm thinking about identifying the first charaters of the seqboot outfiles (ex :" 3 639 " in my example) to recognize each multialignment "blocks" but I didn't manage to do this... In join my frist code and an example of seqboot outfile. Thanks Matthieu From mconte at cirad.fr Thu Aug 4 11:31:23 2005 From: mconte at cirad.fr (matthieu) Date: Thu Aug 4 15:40:25 2005 Subject: [Bioperl-l] dividing seqboot outfile Message-ID: <42F234CB.9030609@cirad.fr> Oups...my script and my exmaple file -------------- next part -------------- #!/usr/bin/perl #### Divide one file who contain 100 multialignments into 10 files of 10 multialignments use Bio::AlignIO; my $file = shift; my $out = shift; my $descripteur = open($file); my $switch=-1; while ($switch <10) { my $alignment = extract($descripteur); print $alignment ; $switch ++; } sub open { my($my_file) = @_; my $descripteur; unless(open($descripteur,$my_file)) { print "Can't open $my_file !\n"; exit; } return $descripteur; } sub extract { my($descripteur_fichier) = @_; my($enregistrement) = ''; my($separateur) = $/; # recognize the motif at the beginning of each alignment $/ = " 3 639\n"; print "$/ \n"; $enregistrement = <$descripteur_fichier>; $/ = $separateur; #print "$separateur !\n"; return $enregistrement; } -------------- next part -------------- 3 639 01g45860.1 ---------- ---------- ---------- ----MMMDFF FF------WW WPDPAAASSS t5g66770.1 ACCTTDDDDS GNNAQQQQQI KQQQQQQQEQ QHHHHHHQFF IILSLNNPWW WPNTSSLGFF t5g66770.2 ACCTTDDDDS GNNAQQQQQI KQQQQQQQEQ QHHHHHHQFF IILSLNNPWW WPNTSSLGFF GLLLDAAAGG FLPPPPPPAV ---------- ---------- --AAPPDDDV GG-------- GLLLSGGGSS AFDDDPPPQV TGGDDSSSDP GPFPPNNLDH HHAATTTTTG GGRLLDDGGG GLLLSGGGSS AFPPPPPPQV TGGDDSSSDP GPFPPNNLDH HHAATTTTTG GGRLLDDGGG ---------- ---------- ---------- ---YPPPAA- --DD------ ---------- GGGGGFEEEE SDEMEELLIS GDVAAADDGC DTTHNPPDDV VIDDPPPDDT PSSVPLLLLR GGGGGFEEEE SDEMEELLIS GDVAAADDGC DTTHNPPDDV VIDDPPPDDT PSSVPLLLLR VDAAALAAAA AAFPPPCCCA PPPAAAALL- AAMMRRREAG GIRR------ ----LHLLLS IDTSSPPTTL LLWPPPSSSS PPPSSPPTTH SSPPTKKEND DSEEDDDFFF FLEELKAAID IDTSSPPTTL LLWPPPSSSS PPPSSPPTTH SSPPTKKEND DSEEDDDFFF FLEELKAAID SAGGEAHHLA ADDSAALASS AAASIGVVAH HHHFTTTSP- SSPPPAPTTD AAEEHHALYY DA--SDPPEL LQQISSVEGG DPPT-EVVAY YYYFEEESPN SSPPPTSSSS SSTTEEDIYY DA--SDPPEL LQQISSVEGG DPPT-EVVAY YYYFEEESPN SSPPPTSSSS SSTTEEDIYY HHYEEEAAAA YYLLKKFTQQ ILLLLFFHCC CDHHIDDFSL QLQQWPPPAL LIALALPPGG KKNDDDAAAA YYSSKKFTQQ ILLLLTTESS SNHHVDDFGI QIQQWPPPAL LLALATTTSG KKNDDDAAAA YYSSKKFTQQ ILLLLTTESS SNHHVDDFGI QIQQWPPPAL LLALATTTSG GPP-RIIITT PPTG-----L DDVLADLAAR RRRFFSADDE VPWWMLLIIA PGEEAAFFNS GKPQRVVVSS PLGEPPSSSL AATNRDFAAK KKDFFDTHHL LGSSSFFVVD PDEEVAVVNF GKPQRVVVSS PLGEPPSSSL AATNRDFAAK KKDFFDTHHL LGSSSFFVVD PDEEVAVVNF SSVLLLHRLL LLGGPPAAAA DQAPP----- IIALCCAASS VRKKIFIIED NNTTTGFDDR FFMLLLYKLL LL-------- DETPPTTTII VVTLLLKKSS LNRRVVGGES NNVVVGFNNR FFMLLLYKLL LL-------- DETPPTTTII VVTLLLKKSS LNRRVVGGES NNVVVGFNNR FTEAFYSSAA VFLDDDASSA SSSSGGGAGN AAEAYLLLIC DDVVGGGEA- -RRERPPPRR VKNAQFSSAA VFLEEENLLG RRRRDDSEER VVEELFFFIS GGIIPPPETI IHHEREEEQQ VKNAQFSSAA VFLEEENLLG RRRRDDSEER VVEELFFFIS GGIIPPPETI IHHEREEEQQ WRDRRRGGLA PPLGNAALRR QARMVVGLLL FFGEGG-HHS EEAADDDDDD GCTHFSSSAW WRVLLNGGFS KKLSYAAVSS QAKILLWNNN YYYSNNYSSI ESKKPPPPPP GFSNLTTSSW WRVLLNGGFS KKLSYAAVSS QAKILLWNNN YYYSNNYSSI ESKKPPPPPP GFSNLTTSSW WWGDGGGNNN NNSGGSSNNS SGSSSSSGGG GGDSSSVCL WW-------- ---------- ---------- --------- WW-------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- -----MTTPP ------WWWP MMDPALLDDD AAGFFFFPPA t5g66770.1 AAYMCTGGGG NLLIIQQKQQ EQHHDHIIGG LLNNPPWWWP --NTSLLSSS GGSAAAAPPQ t5g66770.2 AAYMCTGGGG NLLIIKKKQQ EQHHDHIIGG LLNNPPWWWP --NTSLLSSS GGSAAAAPPQ ---------- ---------- -----ADDDD GV-------- ---------- ---------- GGGGGSNPFP PPFFFFFPDH HHHHHATTTT GGRRLLDDFG GGGGGFEESD EWWEEEELLL GGGGGSNPFP PPFFFFFPDH HHHHHATTTT GGRRLLDDFG GGGGGFEESD EWWEEEELLL ---------- --YYYYPPP- --GADDD--- ---------- ---------D LPFFFFFFPP IGGVVADGPT WWHHHHNPPY VVGPDDDPFD TYPSSRLSVQ SSDNNRRRVD PLPPPPWWPP IGGVVADGPT WWHHHHNPPY VVGPDDDPFD TYPSSRLSVQ SSDNNRRRVD PLPPPPWWPP PPAAAAAAAA AVVVLL-ARE EEEAAAGIR- ---HHLLLMC AAGGAIIIEA GDASAAQLDD PPSSSIIPPP PLLLTTHSTE EDPNNNDSED DDDKKIIIYC AA--RIIISD SDASKKTLQQ PPSSSIIPPP PLLLTTHSTE EDPNNNDSED DDDKKIIIYC AA--RIIISD SDASKKTLQQ DHALAAAGII IGGGRRAAHF TAALLFPP-V VVATTTTDAA AEEAAFLHHH HHYYCPPKAH QREVSEP--- -EEERRAAYF EAALLSPPNA AATSSSSSSS STTDDLIKKK TTNNCPPKAH QREVSEP--- -EEERRAAYF EAALLSPPNA AATSSSSSSS STTDDLIKKK TTNNCPPKAH FTTQIIIILE EEEAAFHHGG DDDHHVVIIF LLMMMMGLQQ PALIIQQALL LAARGPPPPP LTTQIIIILE EEEAATEEKK NNNHHIIVVF IIVVVVGIQQ PALLLQQALL LAARGKKPPP LTTQIIIILE EEEAATEEKK NNNHHIIVVF IIVVVVGIQQ PALLLQQALL LAARGKKPPP FF--TGIIGG GGPPPSGRRD DEE-DGGLLL SRVRSSGGVA AAASEVRPWM QQPPGEEVAA TTQQSGIIPP PPAPPSESSP PEEPAGGNLF VDLNDDPPIL LTTPLLNGSS RRPPDEELAA TTQQSGIIPP PPAPPSESSP PEEPAGGNLF VDLNDDPPIL LTTPLLNGSS RRPPDEELAA FNNSVLLQQQ RLLGPPDDAP PP---IILVV VASSSSRPKI FFVIQQEAAH NKTTTTGFFL VNNFMLLQQQ KLL---DDTP PPTTIVVLAA AKSSSSNPRV VVLGYYEVVL NRVVVVGFFA VNNFMLLQQQ KLL---DDTP PPTTIVVLAA AKSSSSNPRV VVLGYYEVVL NRVVVVGFFA LLDDRFFTTE AFFYYSSAAS GGGNME---A YQEIICDIIV VCAAARRREH HHELSRRRRR AANNRVVKKN AQQFFSSNGR DDERRERRRE LGRIISGLLI IGTTGHRREM MMEKEQQRLL AANNRVVKKN AQQFFSSNGR DDERRERRRE LGRIISGLLI IGTTGHRREM MMEKEQQRLL LRALLLLLSS AAVLLLGSSN LARRRMMLLV GLFGGHHEEE GCCTLGGGWW GRRPPPLAAW MNAFFFFFEE SSVLLLSNNY VAKKKIILLL WNYYYSSEEE GFFSLAAAWW DLLPPPLSSW MNAFFFFFEE SSVLLLSNNY VAKKKIILLL WNYYYSSEEE GFFSLAAAWW DLLPPPLSSW WEAGGGGGDN SNGSSSDNNN GSNNGGKKGG RRGSSSCCL WR-------- ---------- ---------- --------- WR-------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- MDDTTFFPF- -----WMPAA t5g66770.1 MMYMCSSSSG GGNNNLAQVI IKQEEEEEQQ QQQQQHHHHD HQQIIFFGIL SNPPPW-TSL t5g66770.2 MMYMCSSSSG GGNNNLAQVI IKQEEEEEQQ QQQQQHHHHD HQQIIFFGIL SNPPPW-TSL SSGGFLLPPA VV-------- ---------- APDDY----- ---------- ---------- GFSSAFFPPQ VVTPPGGGGP PFFPPNDDHH ATTTFRRLLL SDFGGTTGED EETTGGDADC GFSSAFFPPQ VVTPPGGGGP PFFPPNDDHH ATTTFRRLLL SDFGGTTGED EETTGGDADC --YDPPPPAA A----AA--- ---------- ---------- ----VDAAPP EEFFAAAAFF WWHDPPPPDD DYYYIPPYSS RRRRSSSVVV QQQSDDLLNN RRVVIDTTLL PPPPPPTLWW WWHDPPPPDD DYYYIPPYSS RRRRSSSVVV QQQSDDLLNN RRVVIDTTLL PPPPPPTLWW FFCAPPDAAA AAAAAVLLLL RRREEEEEEG IRR------L LSCAAGGAAI IIEGHLLAAQ WWSSPPLSSS IIPPPLTTTT TTKEDPPPED SEEDDFDLPA IDCAA--RRI IISSPEEAKT WWSSPPLSSS IIPPPLTTTT TTKEDPPPED SEEDDFDLPA IDCAA--RRI IISSPEEAKT SSAAAAVASG GVVAVFTLRR RFPPSPPPPD DAHALL-HFE ECPFAFNNNN AIFFFHCCCH IIESSELPTE EVVAFFELNN RSPPSPPSSS SSEDIILKLD DCPFALNNNN AITTTESSSK IIESSELPTE EVVAFFELNN RSPPSPPSSS SSEDIILKLD DCPFALNNNN AITTTESSSK VVIDDDDFSL LQGLQQWPPP AAALLLIQQL LLLLRRGGGP PPF-LRRITP PPGDEEEE-- IIVDDDDFGI IQGIQQWPPP AAALLLLQQL LTTTRRSSGK PPTQIRRVSA PPEPEEEEPP IIVDDDDFGI IQGIQQWPPP AAALLLLQQL LTTTRRSSGK PPTQIRRVSA PPEPEEEEPP DDDDDVGGLL LLLLAADAAS SRRRVVVFFF GGAAANNSLL DRMLLQQGGG GEVVAANLHH AAAAATGGNL LLLLRRDAAV VDDDLLLFFF PPLLL--PII HNSFFRRDDD DELLAANLYY AAAAATGGNL LLLLRRDAAV VDDDLLLFFF PPLLL--PII HNSFFRRDDD DELLAANLYY RLDQQA---I DDAVVVDASI TTVVIIIEQE EADNNNTGLL FFTTEFYYYS SSAAAVLLAA KLDEETTIIV DDTAAARKSV TTLLGGGEYE EVSNNNVGAA VVKKNQFFYS SSAAAVLLPN KLDEETTIIV DDTAAARKSV TTLLGGGEYE EVSNNNVGAA VVKKNQFFYS SSAAAVLLPN AAAASAAAAA AE-AYLLLQQ RREEEICDDI VCGGEAAARH HPPLRRRRDR AAAGGGGLSA NGGGREEVVV VERELFFFGG RRRRRISGGL IGPPETGGRM MEEKQQRRVL AAAGGGGFES NGGGREEVVV VERELFFFGG RRRRRISGGL IGPPETGGRM MEEKQQRRVL AAAGGGGFES PLGNAALLLR RMLVVVGSG- --HSSVEEEA ADDGGCTTLW WWWHHGRRPP LSGGDGGGGG KLSYAAVVVS KILLLLWNYL YYSIIVEESK KPPGGFSSLW WWWNNDLLPP LT-------- KLSYAAVVVS KILLLLWNYL YYSIIVEESK KPPGGFSSLW WWWNNDLLPP LT-------- GGGNNSSVSS GGGSDSNSSS SSSNGGGKKK SADGGGSLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ---DDDDQ-- --WWPPMMMD t5g66770.1 AMCCCTTSSG NLMMMAAQQQ VIIIIKQQKK QQQQEQQQQH QQQQQQQNPL LPWWPP---N t5g66770.2 AMCCCTTSSG NLMMMAAQQQ VIIIIKKKKK QQQQEQQQQH QQQQQQQNPL LPWWPP---N PASGGLDDDD GLLPPPPP-- ---------- ---------- ---PPPPDGG VGYYY----- TSGGGLSSSS SFFDDDDPTT GGGGDDDNDP GFPFPPNNNN HHHTTTTTGG GGFFFRRSDG TSGGGLSSSS SFFPPPPPTT GGGGDDDNDP GFPFPPNNNN HHHTTTTTGG GGFFFRRSDG ---------- ---------- ---------- ----YYYDPP P----AAAD- ---------- GTTGGGGEFE ESDMTTLLII GGGSVDGPPD DDDWHHHDNP PYYIYPPPDY PSVQPPPSDN GTTGGGGEFE ESDMTTLLII GGGSVDGPPD DDDWHHHDNP PYYIYPPPDY PSVQPPPSDN -----VDAAA LAAAFFPPCA APDDAAAAAA VVVLMRRREE VR------LL VVLLLMCGAI RRRRVIDSSS PPTTWWPPSS SPLLSIIPPP LLLTPTKKPE TEDFDLPPLL LLAIIYC-RI RRRRVIDSSS PPTTWWPPSS SPLLSIIPPP LLLTPTKKPE TEDFDLPPLL LLAIIYC-RI AAGHLAAASL LHAAALLAAS SSGIGGRVVA HFTSSRRLLF PAAPPPTDEH F------YFF DDSPEAAASL LRESSVVEET TT--EERVVA YFESSNRLLS PTTSSSSSTE LLLSSSSYLL DDSPEAAASL LRESSVVEET TT--EERVVA YFESSNRLLS PTTSSSSSTE LLLSSSSYLL YEACCLLLFF TTAQEFFFGD HVHVVIIFFS SMQLQPLLII QAALLRRRPG GGGPPPRRII NDACCSSSFF TTAQETTTKN KIHIIVVFFG GVQIQPLLLL QAATTRRRTG GGGKPPRRVV NDACCSSSFF TTAQETTTKN KIHIIVVFFG GVQIQPLLLL QAATTRRRTG GGGKPPRRVV TGIIISSPTT GGRLRVGLLD LAARSRVVFF SGAAAANSLL DEEEVPPMQQ AAAPEAANSV SGIIISSLGG EESLITGNLD FAAKVDLLFF DPLLTT-PII HLLLLGGSRR DDDPEAANFM SGIIISSLGG EESLITGNLD FAAKVDLLFF DPLLTT-PII HLLLLGGSRR DDDPEAANFM LLHLLGDDPD QAAIDALDVV SSPFVIQAAA DDDHNNKFLL DRRRRFALFY YSSAAAVFSL LLYLL----D ETTVDTLRAA SSPVLGYVVV SSSLNNRFAA NRRRRVALQY YSSAAAVFSL LLYLL----D ETTVDTLRAA SSPVLGYVVV SSSLNNRFAA NRRRRVALQY YSSAAAVFSL LLSGAANNAA AAAQQREIIC CDIGGAAARR REERHHHHEP LSRRRRLLLT LSAVVPLGSN LLLDEERRVV VEEGGRRIIS SGLPKTTGHR REERMMMMEE KERRLLMMME FESVVKLSNY LLLDEERRVV VEEGGRRIIS SGLPKTTGHR REERMMMMEE KERRLLMMME FESVVKLSNY NNNAALLRQA MLLLGGLLLG ---VEGGCLT TTTLGGWHGR FFSASAWEAA AAADDGGGGD YYYAAVVSQA ILLLWWNNNN LYYVEGGFIS SSSLAAWNDL LLTLSSWR-- ---------- YYYAAVVSQA ILLLWWNNNN LYYVEGGFIS SSSLAAWNDL LLTLSSWR-- ---------- DDDNNNNNNN NSNSSSNNVS SGGGSSGGSS SNNGSSSGV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --DQQ----- MMDDASSSLL LAAFFFLPPP t5g66770.1 MYYCTTDSGM AIIAVVVKQQ QKKKQQQQHH HQQNNPPPLL --NNSGFFLL LGGAAAFPPP t5g66770.2 MYYCTTDSGM AIIAVVVKKK KKKKQQQQHH HQQNNPPPLL --NNSGFFLL LGGAAAFPPP VV-------- -------AVG ---------- ---------- ---------- ---------- VVTGGPPFPP NNNDDHHAGG RRLLSDGGGG GGGGGEEEEE ESSEEMEEET LIIIGDSSAA VVTGGPPFPP NNNDDHHAGG RRLLSDGGGG GGGGGEEEEE ESSEEMEEET LIIIGDSSAA ---------Y YYYDDDPPA- ----GGDDD- ---------- ------VVAL LPEEFAAAPP DDPPPCCDDH HHHDDDPPDY YIIYGGDDDP FTYPVVVQSD DLNRRVIISP PLPPPTTLPP DDPPPCCDDH HHHDDDPPDY YIIYGGDDDP FTYPVVVQSD DLNRRVIISP PLPPPTTLPP DAAVLLLLL- --MMRREEEE VAI------- VHHLLLMSGA GDDAAASSSA ALLLDSSHHL LIPLTTTTTH EEPPTKDPEE TNSDFLEEPP LKKAAIYD-R SDDNNASSSK KLLLQIIRRV LIPLTTTTTH EEPPTKDPEE TNSDFLEEPP LKKAAIYD-R SDDNNASSSK KLLLQIIRRV LAAAAVVSAA ASGRRVAVVV HHFFTTLLFP PPVVVVPPPT TTDDAEEHFF LL--YYYACP VSSSELLGDD PTERRVAFFF YYFFTELLSP PPAAAASSSS SSSSSTTELL IISSYYYACP VSSSELLGDD PTERRVAFFF YYFFTELLSP PPAAAASSSS SSSSSTTELL IISSYYYACP PYLLLLFHNN IILEEEAFHH HHCHVISQQQ LQQAIQAALP PPPGGPPFLL LLLLIGIPPP PYSSSSFHNN IILEEEATEE EESKIVGQQQ IQQALQAATT TTTSGPPTII IIIIVGIAAP PYSSSSFHNN IILEEEATEE EESKIVGQQQ IQQALQAATT TTTSGPPTII IIIIVGIAAP PPPTGRRDEE ---RLLSRVV RFFSFFFFFG GAASLLEVPL LAAGEAVANN SSVQQLHLDD PLLGESSPEE PSSILLVDLL NFFDFFFFFP PLTPIILLGF FDDDEVLANN FFMQQLYL-- PLLGESSPEE PSSILLVDLL NFFDFFFFFP PLTPIILLGF FDDDEVLANN FFMQQLYL-- AAQPPPPP-- -IDDVVLLLC CVARKKKFFF TVIEEQEAAA HNNGGGGFLL LLLDRRRRFE --EPPPPPTI IVDDAALLLL LAKNRRRVVV TLGEEYEVVV LNNGGGGFAA AAANRRRRVN --EPPPPPTI IVDDAALLLL LAKNRRRVVV TLGEEYEVVV LNNGGGGFAA AAANRRRRVN ALFYDDSSLL DASSSAGAEE EYQRCCDIIE EGAAREEERR HHEPPPSRRW WRDRRRRRTT ALQFEESSLL ENRRREEVEE ELGRSSGLLE EKTGREEERR MMEEEEEQQW WRVLLLLLEE ALQFEESSLL ENRRREEVEE ELGRSSGLLE EKTGREEERR MMEEEEEQQW WRVLLLLLEE TRRAGGGGLS AVPLGSNRMV VGGGFE--HS ADCTTGPPSA SSAEEAGGDD GGGGGDNNNN ENNAGGGGFE SVKLSNYKIL LWWWYSYYSI KPFSSDPPTL SSSRR----- ---------- ENNAGGGGFE SVKLSNYKIL LWWWYSYYSI KPFSSDPPTL SSSRR----- ---------- SSVSSSGGSS SSGDDSSNNN SSSSSGGSGS AAADDSVVC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- MMTTTFPPQ- ------WWWW t5g66770.1 AYYMMMMMDD SSGGNNLMMM MMAAIIIAAQ QQVQQQEEHH HHIIIFGGNL SNNPPPWWWW t5g66770.2 AYYMMMMMDD SSGGNNLMMM MMAAIIIAAQ QQVKQQEEHH HHIIIFGGNL SNNPPPWWWW PAASSGGLLA AAGFFPPPPP PAAV------ ---------A AAPPPDDGGG Y--------- PSLFFGGLLG GGSAAPPDDP PQQVTGGGGD PPNHHHHHHA AATTTTTGGG FLSSFGGGGG PSLFFGGLLG GGSAAPPPPP PQQVTGGGGD PPNHHHHHHA AATTTTTGGG FLSSFGGGGG ---------- ---------- ---------- ---------- -DPPA----G ADD------- TGGGEESDEE WTTTLLISGG DSSSVAADDG GGGPPDDDWW WDNPDVVIIG PDDPFFSRLV TGGGEESDEE WTTTLLISGG DSSSVAADDG GGGPPDDDWW WDNPDVVIIG PDDPFFSRLV -----VDAAL LPEEFAFPPP PPAAAAAVV- AAMRRRRREE EEVVGGGR-- ---------- QPSDNIDTTP PLPPPTWPPP PPSSPPPLLE SSPTTTKKEE PETTDDDEDD DDDFLLEEPP QPSDNIDTTP PLPPPTWPPP PPSSPPPLLE SSPTTTKKEE PETTDDDEDD DDDFLLEEPP LLVVVLMCCA GGAIIEEAAL ASSLAHHAAL LASSASGGRA AVHFTTASRF PPVAATTTAA LLLLLIYCCA --RIISSNNE ASSLLRREEV VEGGPTEERA AFYFTTASRS PPATTSSSSS LLLLLIYCCA --RIISSNNE ASSLLRREEV VEGGPTEERA AFYFTTASRS PPATTSSSSS EAAFFFFLL- --HHHHHHYE AACLLKAAHF TNAILAAFHG GGCCCHHHVH HVIDDFFSLG TDDLLLLIIL SSKTTTTTND AACSSKAAHL TNAILAATEK KKSSSKKKIH HIVDDFFGIG TDDLLLLIIL SSKTTTTTND AACSSKAAHL TNAILAATEK KKSSSKKKIH HIVDDFFGIG LAAIAAPPGG PPF-LIIITG IIIPSPTTRR -RVGGLDRRV SRGASDDEEP PQAAAASVLQ IAALAATTSG PPTQIVVVSG IIIPSLGGSS PITGGLDDDL DIPLPHHLLG GRDDDVFMLQ IAALAATTSG PPTQIVVVSG IIIPSLGGSS PITGGLDDDL DIPLPHHLLG GRDDDVFMLQ QHRLLPPDQQ APP--IVVLL DDAAAVVVRR KKFFFTEQQA KTTGGGFLDT TALYYYYYAA QYKLL--DEE TPPTTVAALL RRKKKLLLNN RRVVVTEYYV RVVGGGFANK KALFFFFYAA QYKLL--DEE TPPTTVAALL RRKKKLLLNN RRVVVTEYYV RVVGGGFANK KALFFFFYAA AAVFDSSDGA AMAAAYYLLQ QRRIIDVVVC CEAA--RREE RHHHEEPPLS RRRDDDRLTG AAVFESSEDE VRVVELLFFG GRRIIGIIIG GEGGIIHREE RMMMEEEEKE QQRVVVLMEG AAVFESSEDE VRVVELLFFG GRRIIGIIIG GEGGIIHREE RMMMEEEEKE QQRVVVLMEG AVLGNLRQQM MLVGLFSSEG --HHVAADDL TTLRFAAASS AWEEEAGGGG GGDDDNNNNN SVLSYVSQQI ILLWNYNNSN YYSSVKKPPI SSLLLLLLSS SWRRR----- ---------- SVLSYVSQQI ILLWNYNNSN YYSSVKKPPI SSLLLLLLSS SWRRR----- ---------- NSVSGSSSSG SSDNSGSSSN NGSGAADDGS SSVVCCCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- MTTP------ --------WP DDASSLAAFL t5g66770.1 MAMDDNNNLL MMIIAQQVVK QQQQQQQQQD HIIGPLLSLL NNPPPPPPWP NNSGFLGGAF t5g66770.2 MAMDDNNNLL MMIIAQQVVK QQQQQQQQQD HIIGPLLSLL NNPPPPPPWP NNSGFLGGAF LPPPA----- ---------- --------PD DDVVGGG--- ---------- ---------- FPDPFGGGDS SSSDDGFFFP PPPPDHHHTT TTGGGGGRRR LLSSGGGTTT GGGEEESEWE FPPPFGGGDS SSSDDGFFFP PPPPDHHHTT TTGGGGGRRR LLSSGGGTTT GGGEEESEWE ---------- ---------- ------YPA- ----AD---- ---------- --VVAAALLL TLIISGGGGD SSVVAADGGG DDDDDTHNDY YVYYPDPPFT YPPRLVQQPD DLIITSSPPP TLIISGGGGD SSVVAADGGG DDDDDTHNDY YVYYPDPPFT YPPRLVQQPD DLIITSSPPP LAAAAFFPPC CPAL-AMMEE GGIR------ -----LLLHH HHLLLMMSSS CGAIEAGAAS PPTTLWWPPS SPSTESPPDE DDSEDDDFFD DDEEPLLLKK KKIIIYYDDD C-RISDSNNS PPTTLWWPPS SPSTESPPDE DDSEDDDFFD DDEEPLLLKK KKIIIYYDDD C-RISDSNNS AQQDHHASAA ASRRAAVHFF FFTRLFPPAA PPPPTTTDAH AFF-HFFYEE ECPYYYLKFA KTTQRREGPP PTRRAAFYFF FFENLSPPTT SSSSSSSSSE DLLSKLLNDD DCPYYYSKFA KTTQRREGPP PTRRAAFYFF FFENLSPPTT SSSSSSSSSE DLLSKLLNDD DCPYYYSKFA AHHHFFFTTA AANQQQILLA AAFFHCDDHH VIIIFSMQQQ QLLQIQLLLL LRPGPFF-RR AHHHLLLTTA AANQQQILLA AATTESNNKK IVVVFGVQQQ QIIQLQLLLL TRTSKTTQRR AHHHLLLTTA AANQQQILLA AATTESNNKK IVVVFGVQQQ QIIQLQLLLL TRTSKTTQRR ITIGPTGRRD -LRDGLLAAA DDLRSSRRRV RFFSRRAAAN SSLDEEPWWW WMLQIIIIAA VSIPAGESSP SLIAGNNRRR DDFKVVDDDL NFFDIILTT- PPIHLLGSSS SSFRVVVVDD VSIPAGESSP SLIAGNNRRR DDFKVVDDDL NFFDIILTT- PPIHLLGSSS SSFRVVVVDD AEVFFNNVQL RLLLLGDDD- --IIIIIDAA AADAASSSVP KTQEEEAADD DNTTFDDDRT DELVVNNMQL KLLLL--DDT TIVVVVVDTT TTRKKSSSLP RTYEEEVVSS SNVVFNNNRK DELVVNNMQL KLLLL--DDT TIVVVVVDTT TTRKKSSSLP RTYEEEVVSS SNVVFNNNRK TEFFYSSSVF SLAAAASASG AGGNNEEE-- AAAYYLQRRE ICCIVVCGGA -RERRHEEPP KNQQFSSSVF SLPNNNLGRS EEERREEERR EEELLFGRRR ISSLIIGPKG IHERRMEEEE KNQQFSSSVF SLPNNNLGRS EEERREEERR EEELLFGRRR ISSLIIGPKG IHERRMEEEE RWWRDDRAGS AVVVLGGSLR RQAAMGGLLL FSSSGG--HV VEEDDDDDGL LLWHHGGGPP QWWRVVLAGE SVVVLSSNVS SQAAIWWNNN YNNNYYYYSV VEEPPPPPGI LLWNNDDDPP QWWRVVLAGE SVVVLSSNVS SQAAIWWNNN YNNNYYYYSV VEEPPPPPGI LLWNNDDDPP PLSAAAGDGG DNNNNGGSSD DDSNSSSNGS AADDGGSSL PLTLS----- ---------- ---------- --------- PLTLS----- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ------MFFF -----WWPPP t5g66770.1 MYYYYMMMMM MCSGGMMMAI AAQQQQIKQQ QEEEQQQQQQ QQQQDDHFFF PLNNPWWPPP t5g66770.2 MYYYYMMMMM MCSGGMMMAI AAQQQQIKQQ QEEEQQQQQQ QQQQDDHFFF PLNNPWWPPP DDAAASSSSL DAAGGLPPPP PAVVV----- ---------- --PPPDDG-- ---------- NNSSSGGFFL SGGSSFPDPP PFVVVGGGGG DNDPFFPFPP HHTTTTTGRR LSSSSSFFFG NNSSSGGFFL SGGSSFPPPP PFVVVGGGGG DNDPFFPFPP HHTTTTTGRR LSSSSSFFFG ---------- ---------- ---------- --YPPP---A D--------- --------VV GGGTGGGFSE MMIISSGGGS SSSSVAPPDC TTHNNNYIIP DPPDDTYRLL LSVQSLLNII GGGTGGGFSE MMIISSGGGS SSSSVAPPDC TTHNNNYIIP DPPDDTYRLL LSVQSLLNII DAAALEEFAA FFPPPCAAPA AA----AMEE EEGIR----- -----VVVVL MSSCIEGGDH DTSSPPPPPL WWPPPSSSPS PPEEEESPED DPDSEDDDFF FLEPPLLLLI YDDCISSSDP DTSSPPPPPL WWPPPSSSPS PPEEEESPED DPDSEDDDFF FLEPPLLLLI YDDCISSSDP AAAQQLLSHA AALAAAAAGG VAAFALSRRR RFF-PAPTDA AHHHAF--YH HFEACCPPYY AKKTTLLIRS SSVDDDPPEE VAAFALSNNR RSSNPTSSSS SEEEDLLSYK TLDACCPPYY AKKTTLLIRS SSVDDDPPEE VAAFALSNNR RSSNPTSSSS SEEEDLLSYK TLDACCPPYY YYLFFAAANQ AAILLEFFHH HCCCDHHHHV VVVIDFSGQQ WPLILLRPPG GPFF--LLRI YYSFLAAANQ AAILLETTEE ESSSNKKKKI IIIVDFGGQQ WPLLTTRTTS GKTTQQIIRV YYSFLAAANQ AAILLETTEE ESSSNKKKKI IIIVDFGGQQ WPLLTTRTTS GKTTQQIIRV TGGIGSSSPP PTTGRD-LRV VVGRADDDDL LLARSVVRRR RFFFRGVVAA ASLLDDEVRR SGGIPSSSLL LGGESPSLIT TTGRRDDDDF FFAKVLLDDD NFFFIPIILT TPIIHHLLNN SGGIPSSSLL LGGESPSLIT TTGRRDDDDF FFAKVLLDDD NFFFIPIILT TPIIHHLLNN PWWAPPVVFF FSLQLHRLLL LLDAAIAAAV CVSSPKKITV IIIHHNNKTT DTALLFYYYY GSSDPPLLVV VFLQLYKLLL LL-TTVTTTA LASSPRRVTL GGGLLNNRVV NKALLQFFYY GSSDPPLLVV VFLQLYKLLL LL-TTVTTTA LASSPRRVTL GGGLLNNRVV NKALLQFFYY AVFFDDDSLL LDASAASGGA MAE-AYYLQR RREEEECDII VCCGGEEGAA --RREERRHE AVFFEEESLL LENLGGRDSE RVERELLFGR RRRRRRSGLL IGGPPEEKTG IIRREERRME AVFFEEESLL LENLGGRDSE RVERELLFGR RRRRRRSGLL IGGPPEEKTG IIRREERRME PWWRRRDLTA SAAAVGGSNA RARRMMLGGG LFFFSGGHVV VVEADDGCCL LLWHPSAAAA EWWRRRVMEA ESSSVSSNYA SAKKIILWWW NYYYNNNSVV VVEKPPGFFI LLWNPTLLLS EWWRRRVMEA ESSSVSSNYA SAKKIILWWW NYYYNNNSVV VVEKPPGFFI LLWNPTLLLS AWWWAGGNNN NSSNSSSNGS SSSSSSGGSG SSSARGSSL SWWW------ ---------- ---------- --------- SWWW------ ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- MDTFP----- --WMMMMAAD t5g66770.1 MMAYTDDSGA AAIAQVVVII KQKQQEEQQQ QQQQQHHHDD HQIFGPPSSL NPW----LLS t5g66770.2 MMAYTDDSGA AAIAQVVVII KKKQQEEQQQ QQQQQHHHDD HQIFGPPSSL NPW----LLS DAGGFPPPAA ---------- ---------- ----AAADDD GVG------- ---------- SGSSAPPPQQ TTTGGGSSDP GGPFPPNLLD DHHHAAATTT GGGRSFFFGG TTGEEEWMLL SGSSAPPPQQ TTTGGGSSDP GGPFPPNLLD DHHHAAATTT GGGRSFFFGG TTGEEEWMLL ---------- DPA-GDDD-- ---------- ---------- VDALLPEAFC AAAAPAAALM SGGDSADDDD DPDYGDDDPD DTYPPRRRSV VPSDDDNVVV IDSPPLPTWS SSSSPSSPTP SGGDSADDDD DPDYGDDDPD DTYPPRRRSV VPSDDDNVVV IDSPPLPTWS SSSSPSSPTP RRREEEEEEE EGIRR----- -----LVVVL LMCCCAEAGG DDDDHALAAS AAQQQHAAAV TTKEEEDPEE EDSEEDDDDD DEPPPLLLLA IYCCCRSDSS DDDDPNEAAS KKTTTRESSL TTKEEEDPEE EDSEEDDDDD DEPPPLLLLA IYCCCRSDSS DDDDPNEAAS KKTTTRESSL SSSSASIRVV VFTTTARRRR LLPPPVPPTT HAAFLL---- YHYAAPYYYL LLKFHHFTNN GGGGDT-RVF FFTTTANNNR LLPPPASSSS EDDLIILSSS YTNAAPYYYS SSKFHHLTNN GGGGDT-RVF FFTTTANNNR LLPPPASSSS EDDLIILSSS YTNAAPYYYS SSKFHHLTNN QQAAILLLEE AAAFFFHGDH VVVHHHHIII DDDFFFMGGL LPPAALAARG GGPPPFLLRR QQAAILLLEE AAATTTEKNK IIIHHHHVVV DDDFFFVGGI IPPAALAARS SGKKPTIIRR QQAAILLLEE AAATTTEKNK IIIHHHHVVV DDDFFFVGGI IPPAALAARS SGKKPTIIRR IGGIIGPPPP SPTDE--LRR DDLSSVRRVV RSFFGVAADR RPPWMAAEAV NSVQQRLLLL VGGIIPPPPP SLGPEPPLII AAFVVLDDLL NDFFPILTHN NGGSSDDEVL NFMQQKLLLL VGGIIPPPPP SLGPEPPLII AAFVVLDDLL NDFFPILTHN NGGSSDDEVL NFMQQKLLLL LGPAAAADDQ AP---DDAVL LLLVVSSPKI FFFTTVIIEE EEQQEEAAAA DDHHHNNNTT L------DDE TPTIIDDTAL LLLAASSPRV VVVTTLGGEE EEYYEEVVVV SSLLLNNNVV L------DDE TPTIIDDTAL LLLAASSPRV VVVTTLGGEE EEYYEEVVVV SSLLLNNNVV GFRFEEAAYY YSFDDSSSDA AASGGGGGNA AMMEAAQEEE ICDVVVVCEE AA----RRHP GFRVNNAAFY YSFEESSSEP GGRDDSSERV VRREEEGRRR ISGIIIIGEE TTIIIIHRME GFRVNNAAFY YSFEESSSEP GGRDDSSERV VRREEEGRRR ISGIIIIGEE TTIIIIHRME LSWWRRDRLR GLLAVVPNNN AAQLLVGLLF SG--SVEEAG LLTRPSAAAA AAGDGGGGDD KEWWRRVLMN GFFSVVKYYY AAQLLLWNNY NYLLIVSSKG IISLPTSS-- ---------- KEWWRRVLMN GFFSVVKYYY AAQLLLWNNY NYLLIVSSKG IISLPTSS-- ---------- NNNNNNNSSS SVDDSNSSSS SNGGKKSSGA DSSVCCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -----MMMDD TFF----PPD PLLFLLAAVV t5g66770.1 AYYYYMCTTD SGGGMIIVVI KQKKQQQQQQ QQQQDHHHQQ IFIPLNPPPN TLLAFFQQVV t5g66770.2 AYYYYMCTTD SGGGMIIVVI KKKKQQQQQQ QQQQDHHHQQ IFIPLNPPPN TLLAFFQQVV ---------- ---------- -------PDD DDVY------ ---------- ---------- TTGGGGDDSN NGFFFPFLLL DDHHHHHTTT TTGFRLSSDD FFGGGTGGEE FEEDDEEELL TTGGGGDDSN NGFFFPFLLL DDHHHHHTTT TTGFRLSSDD FFGGGTGGEE FEEDDEEELL ---------- ---------- --YPPP---- ---------- ---------- VAAAAALLPF LLIGGDSSSV DDDGGDDDDD DDHNPPYYVV YYFDDTYSSR SQQQPPNNRV ITSSSSPPLP LLIGGDSSSV DDDGGDDDDD DDHNPPYYVV YYFDDTYSSR SQQQPPNNRV ITSSSSPPLP AAAPPDDAAA ALL-AAMMMR RREEEVAG-- ---------L HHHLGIIAGD HHAAAASSQL PPLPPLLIIP PTTHSSPPPT KKEEDTNDDD DDFFEPPPPL KKKI-IIDSD PPNNNNSSTL PPLPPLLIIP PTTHSSPPPT KKEEDTNDDD DDFFEPPPPL KKKI-IIDSD PPNNNNSSTL ASSSHAAAAA SSIVVVAAHF FFTTAALSR- PPPVVAAPPT TTDDAEHHF- YFFYYCPYYL LIIIRESSED TT-VVVAAYF FFTEAALSNN PPPAATTSSS SSSSSTEELL YLLNNCPYYS LIIIRESSED TT-VVVAAYF FFTEAALSNN PPPAATTSSS SSSSSTEELL YLLNNCPYYS LKAAHFNQAA IILLLAFHGH VHIIIDFSSL LQGQWWPPPL LQALLGGGPP PFLLRGPPPP SKAAHLNQAA IILLLATEKK IHVVVDFGGI IQGQWWPPPL LQATTSSGKP PTIIRGAAAP SKAAHLNQAA IILLLATEKK IHVVVDFGGI IQGQWWPPPL LQATTSSGKP PTIIRGAAAP PPTRDERDDV VLLRAAADDL LAAAAVRVRF FFSFRRGVVV AAASSVVWWW LLQIIGGGEV PLGSPEIAAT TNNRRRRDDF FAAAALDLNF FFDFIIPIII TTTPPLLSSS FFRVVDDDEL PLGSPEIAAT TNNRRRRDDF FAAAALDLNF FFDFIIPIII TTTPPLLSSS FFRVVDDDEL VVVQHLGDPP PAADQAP--D DAAAVLLCCC VSSVRRPVII EEQADHKKKR RFTTTALLFY MMMQYL---- ---DETPIID DTTTALLLLL ASSLNNPLGG EEYVSLRRRR RVKKKALLQF MMMQYL---- ---DETPIID DTTTALLLLL ASSLNNPLGG EEYVSLRRRR RVKKKALLQF YAAFDLLLLD DAAASGGAGN AAAMAAA--A YLLLQEEEEI CDVEGAAAAA RREEERRHEP YAAFELLLLE EPPPRDDEER VVVRVVVRRE LFFFGRRRRI SGIEKTTGGG HHEEERRMEE YAAFELLLLE EPPPRDDEER VVVRVVVRRE LFFFGRRRRI SGIEKTTGGG HHEEERRMEE LSRWRDAGGL LSPLLLSSNR QARMLVGGGF SGGGHSSVVE GGCLTGWWWH HGLAWWGDDG KEQWRVAGGF FEKLLLNNYS QAKILLWWWY NYYNSIIVVS GGFISAWWWN NDLSWW---- KEQWRVAGGF FEKLLLNNYS QAKILLWWWY NYYNSIIVVS GGFISAWWWN NDLSWW---- GGNSSNNNSS NGSSGGDDNN SSSNGSSSSA ADDGGSCCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --TQ----WP PMMASLDDDA FFAA------ t5g66770.1 MMAMCCGGMA AIAAAQQVQK QQQQQQHHQD DDINLLSSWP P--SFLSSSG AAFFTTGSND t5g66770.2 MMAMCCGGMA AIAAAQQVKK QQQQQQHHQD DDINLLSSWP P--SFLSSSG AAFFTTGSND ---------- -----APDDD DVY------- ---------- ---------- ---------- PGGGPNNNNN LHHHHATTTT TGFRRRSSDF GGGTGEFFSD DEWEETLLSV DDDGPPPDDD PGGGPNNNNN LHHHHATTTT TGFRRRSSDF GGGTGEFFSD DEWEETLLSV DDDGPPPDDD -YDPPPA--- --AD------ ---------- --VDDDAAAL PPPEFFAAAA APDAAAV-AR THDNNPDVVI IYPDPPDTTY YYSRRVPSDD RVIDDDTTSP LLLPPPPPLL LPLSIPLEST THDNNPDVVI IYPDPPDTTY YYSRRVPSDD RVIDDDTTSP LLLPPPPPLL LPLSIPLEST RREEEEGIRR ----LHHLMM SCCAGAEHHA SSSAQQLADS SHHHAASSSS SIIGGGRRVH TKEEDPDSEE DDLPLKKIYY DCCA-RSPPN SSSKTTLLQI IRRRSSGGTT T--EEERRVY TKEEDPDSEE DDLPLKKIYY DCCA-RSPPN SSSKTTLLQI IRRRSSGGTT T--EEERRVY FFFTAALSRR RPSSPPATTA E--YHFYYYE EEACPFAFQA ILEAFFHGCC DHVSSSLLQL FFFTAALSRR RPSSPPTSSS TLSYKLNNND DDACPFALQA ILEATTEKSS NHIGGGIIQI FFFTAALSRR RPSSPPTSSS TLSYKLNNND DDACPFALQA ILEATTEKSS NHIGGGIIQI LQQWWPALII ALRRPPPPFF LIPSPPGRRD DEE-LLVGGG LLLLLLLARV RRSSSRRGVA IQQWWPALLL ATRRTPPPTT IVPSLLESSP PEEPLLTGGG NNLLFFFAKL DNDDDIIPIL IQQWWPALLL ATRRTPPPTT IVPSLLESSP PEEPLLTGGG NNLLFFFAKL DNDDDIIPIL ALLDDEERRP PWMMMQAAPP EAAFNLLLQQ HHRLLGGDDD PDQAAA-IID AVLLCCCCCV TIIHHLLNNG GSSSSRDDPP EVAVNLLLQQ YYKLL----- -DETTTTVVD TALLLLLLLA TIIHHLLNNG GSSSSRDDPP EVAVNLLLQQ YYKLL----- -DETTTTVVD TALLLLLLLA SSVVKKIIIF TTVIQEAADD NNTTTFDRFA ALLFFFFYYY SAFFFDSSSL LLAAAAASSG SSLLRRVVVV TTLGYEVVSS NNVVVFNRVA ALLQQQQYYY SAFFFESSSL LLPPNNNLRD SSLLRRVVVV TTLGYEVVSS NNVVVFNRVA ALLQQQQYYY SAFFFESSSL LLPPNNNLRD GGNNMMEE-Y LLQRICCIIG EA-RREERHE EPPPPLLSWW RRDRRRLTRR GLLLPLSNLL SERRRREERL FFGRISSLLP ETIRREERME EEEEEKKEWW RRVLLLMENN GFFFKLNYVV SERRRREERL FFGRISSLLP ETIRREERME EEEEEKKEWW RRVLLLMENN GFFFKLNYVV RRRQMMLVVG LLLFSSGE-- VADCCLLGGW HHHRRRPLLL FSAAGGDDGG GGNNNNNNNS SSSQIILLLW NNNYNNYSLL VKPFFIIAAW NNNLLLPLLL LTSS------ ---------- SSSQIILLLW NNNYNNYSLL VKPFFIIAAW NNNLLLPLLL LTSS------ ---------- NNVSGGSSSD DSSNSSSNGG KSGGGADDDG SSSSVCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----MMDDTT FFFFF---PP MPAAASSSLA t5g66770.1 MMYYMCTTDG GNLLAIQQQV VIIIKEQQQQ HQQDHHQQII FFFIILNPPP -TSLLGFFLG t5g66770.2 MMYYMCTTDG GNLLAIQQQV VIIIKEQQQQ HQQDHHQQII FFFIILNPPP -TSLLGFFLG FLPPAA---- ---------- ---------- -AAPDGGYYY ---------- ---------- AFPDFFTTGG GGDDDDDDDP PFNLLDDHHH HAATTGGFFF RLDFGGGGEF FESDDWWETT AFPPFFTTGG GGDDDDDDDP PFNLLDDHHH HAATTGGFFF RLDFGGGGEF FESDDWWETT ---------- ---------- -----YPPP- ---GD----- ---------- --VVVDAAAE LIIGGGGSVV DDDGGPPPPD CCCCTHNNNY VIIGDFDTTY SRRLLSPPDL NRIIIDTTSP LIIGGGGSVV DDDGGPPPPD CCCCTHNNNY VIIGDFDTTY SRRLLSPPDL NRIIIDTTSP FFAAAFFFPP AAAA-ARRRR REEEEGIIR- ---LVHLLMS SCAAAEAAGD AAALAADHHA PPPTLWWWPP SSIPESTTTT KDEEEDSSED DFPLLKAIYD DCAARSDDSD NNALLLQRRS PPPTLWWWPP SSIPESTTTT KDEEEDSSED DFPLLKAIYD DCAARSDDSD NNALLLQRRS LAAAAVVAGI RVVVAAHFFF FTTLLSRRRL PPSVPPTAEH F--PPYLKKF AHHFFAILLE VSSSELLP-- RVVVAAYFFF FTELLSRRRL PPSASSSSTE LLSPPYSKKF AHHLLAILLE VSSSELLP-- RVVVAAYFFF FTELLSRRRL PPSASSSSTE LLSPPYSKKF AHHLLAILLE AFFHDDFSMM LLQWPAALLI QLLAALPGGG GPIIPPPTGR E-----LLRR VVGRLLVVRR ATTKDDFGVV IIQWPAALLL QLLAATTSSG GPVVAALGES EPPPSSLLII TTGRLFLLDD ATTKDDFGVV IIQWPAALLL QLLAATTSSG GPVVAALGES EPPPSSLLII TTGRLFLLDD RRRSSSFRRR GGAANSSSLD DEVWMQPPGA NVVVLQQHLD AADQQA---- IIIAAVVLDD DNNDDDFIII PPTT-PPPIH HLLSSRPPDA NMMMLQQYL- --DEETTTTI VVVTTAALRR DNNDDDFIII PPTT-PPPIH HLLSSRPPDA NMMMLQQYL- --DEETTTTI VVVTTAALRR CCVVVAASSS VVVRPPFFTV IEEQEAHKKK TGGFDTTAYY YYAVVVDDDD SLAAAAAASG LLAAAKKSSS LLLNPPVVTL GEEYEVLRRR VGGFNKKAFY YYAVVVEEEE SLPPPPNNRD LLAAAKKSSS LLLNPPVVTL GEEYEVLRRR VGGFNKKAFY YYAVVVEEEE SLPPPPNNRD GGGGGAGGGA AMMMAEAYRC CDIIIVGEAA -RRRREEEPP LLLSWRTGGS PPPLGNNAAA DDDSSEEEEV VRRRVEELRS SGLLLIPEGG IHRRREEEEE KKKEWREGGE KKKLSYYAAA DDDSSEEEEV VRRRVEELRS SGLLLIPEGG IHRRREEEEE KKKEWREGGE KKKLSYYAAA LLQMMVVVVG LLSGEG---- HHSSVEADGG CCLTLGHHHG GRPAGGGGGD DNNNNNNNNN VVQIILLLLW NNNYSNLLLY SSIIVEKPGG FFISLANNND DLP------- ---------- VVQIILLLLW NNNYSNLLLY SSIIVEKPGG FFISLANNND DLP------- ---------- SNSSVVSSGS SSGSSDNNNS SNGSGGAARD DGSSVVCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---MDDTPPF ----WPMPAA SSGFFLPPPP t5g66770.1 MMCCTGMAII QQQQKKKQKQ QQQQQQQQHH DDDHQQIGGI PLLLWP-TSS GFSAAFPDDD t5g66770.2 MMCCTGMAII QQQQKKKKKQ QQQQQQQQHH DDDHQQIGGI PLLLWP-TSS GFSAAFPPPP PAAVV----- ---------- -PGGYY---- ---------- ---------- ---------- PFQVVTGGGD DSDPFFFPNN DTGGFFLLSS SSSDDFGGTT GGGGESSSSS SDDEEEMETS PFQVVTGGGD DSDPFFFPNN DTGGFFLLSS SSSDDFGGTT GGGGESSSSS SDDEEEMETS ---------- --------YY YDDPPPP-AA ---------- ------VDDA AALLPEEEFF GGGDSSVADG GGPDDCTWHH HDDNNNPIPP FDDTSVVQQP DDLLVVIDDT SSPPLPPPPP GGGDSSVADG GGPDDCTWHH HDDNNNPIPP FDDTSVVQQP DDLLVVIDDT SSPPLPPPPP AAAFPPCCPD AAAAAVL-AA RRREEEEVAG --------LV LMAGGIEGDD DDALLASSLA TTLWPPSSPL SSIPPLTHSS TKKEDDPTND DDDDDLEPLL AYA--ISSDD DDNEEASSLL TTLWPPSSPL SSIPPLTHSS TKKEDDPTND DDDDDLEPLL AYA--ISSDD DDNEEASSLL DDHALAAAAA ASGIGGRRRR VVVTTTTAAA SRRRLPAPPP PTTDDAAFFL ---YYHHYCC QQREVSDDDD PT--EERRRR FFFTEEEAAA SNRRLPTSSS SSSSSDDLLI LSSYYTTNCC QQREVSDDDD PT--EERRRR FFFTEEEAAA SNRRLPTSSS SSSSSDDLLI LSSYYTTNCC CCPYYFHHFT TTANNNNNQA IFCDDHVIDD DFQQQQQGGL QWWQAGGLRT GGIIPSPRRD CCPYYFHHLT TTANNNNNQA ITSNNKIVDD DFQQQQQGGI QWWQASGIRS GGIIASLSSP CCPYYFHHLT TTANNNNNQA ITSNNKIVDD DFQQQQQGGI QWWQASGIRS GGIIASLSSP ---RDVGRRL LLADRRSSVF SRGGGGVVAN NNNDEEVVPP WLQAAPEEAF LHRRLLPADD SSSIATGRRL LLRDKKVVLF DIPPPPIIT- ---HLLLLGG SFRDDPEEAV LYKKLL--DD SSSIATGRRL LLRDKKVVLF DIPPPPIIT- ---HLLLLGG SFRDDPEEAV LYKKLL--DD DQQQPPP--D ALLLDDCCVV ASRPPFFFFT IIIIEEQEAH KKGFFRFTEL LFFYYYYSSS DEEEPPPTID TLLLRRLLAA KSNPPVVVVT GGGGEEYEVL RRGFFRVKNL LQQFYYYSSS DEEEPPPTID TLLLRRLLAA KSNPPVVVVT GGGGEEYEVL RRGFFRVKNL LQQFYYYSSS VVDDLDAAAS SGGGGAAMAA E--AYLLLQQ RRREIIICVV VGGGEGGAAR RRHHPPLLWR VVEELEPNNL LDDSSVVRVV ERRELFFFGG RRRRIIISII IPPPEKKTGH RRMMEEKKWR VVEELEPNNL LDDSSVVRVV ERRELFFFGG RRRRIIISII IPPPEKKTGH RRMMEEKKWR DRLLLTTRRR RGLSVPLGSS ALQAAALLLL FFS--VEADD GTLGRPFAAW WWAAADDDGG VLMMMEENNN NGFEVKLSNN AVQAAANNNN YYNYYVEKPP GSLDLPLLLW WW-------- VLMMMEENNN NGFEVKLSNN AVQAAANNNN YYNYYVEKPP GSLDLPLLLW WW-------- DDDNNNNNSS SSVGSGDNNN GKSSGGRDDS SSVCLLLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---DTTFPF- ----PPDDPP AAAASLLGGG t5g66770.1 YTTDGLMAII AAQQQQQQKK QKKKQQQQQQ QHDQIIFGIP NNNPPPNNTT SSLLGLLSSS t5g66770.2 YTTDGLMAII AAQQQQQQKK KKKKQQQQQQ QHDQIIFGIP NNNPPPNNTT SSLLGLLSSS GFFPPPAAAA AA-------- ----APPDDG GVYY------ ---------- ---------- SAAPPPFFQQ QQGSPGGPPF FFDDATTTTG GGFFRLSDFF GGGGGGGGGG GGESSDMTLL SAAPPPFFQQ QQGSPGGPPF FFDDATTTTG GGFFRLSDFF GGGGGGGGGG GGESSDMTLL ---------- ---------- -YDDPP--GG GAAA------ ---------- ---------V LISGGGVVVA AAPDCCDTTT THDDNPYYGG GPPPPPPFTT TPPSRLVQQQ QPSDDDLLNI LISGGGVVVA AAPDCCDTTT THDDNPYYGG GPPPPPPFTT TPPSRLVQQQ QPSDDDLLNI APPAFFPAAD AAAA--AAMM MREAAA---- ---------- VVVLLMSCII IEAAADDHLL SLLLWWPSSL SSPPHHSSPP PKPNNNDDDD DDFDDDEEPP LLLAIYDCII ISDDDDDPEE SLLLWWPSSL SSPPHHSSPP PKPNNNDDDD DDFDDDEEPP LLLAIYDCII ISDDDDDPEE AQQLDSHAAA LLAVSAASSI IRRVAVFTLS RRRLLFP--- PVVVAAPTDE HHLL--YHFF ATTLQIREES VVELGDDTT- -RRVAFFTLS NRRLLSPNNN PAAATTSSST EEIILLYTLL ATTLQIREES VVELGDDTT- -RRVAFFTLS NRRLLSPNNN PAAATTSSST EEIILLYTLL YEACCLKFFF HHFAAANQLE AAAAAFHHHG DDVFSQLLLW PPAIQQAAGG GGGGGGF-LR NDACCSKFFF HHLAAANQLE AAAAATEEEK NNIFGQIIIW PPALQQAASS SSGGGGTQIR NDACCSKFFF HHLAAANQLE AAAAATEEEK NNIFGQIIIW PPALQQAASS SSGGGGTQIR RRIGIIPPTT TGRRRDDDEL RRRDDGLRRA ADLLRVRRSF RRGGVAAAAN NNDDEERLLL RRVGIIALGG GESSSPPPEL IIIAAGNRRR RDFFKLDNDF IIPPILLLL- --HHLLNFFF RRVGIIALGG GESSSPPPEL IIIAAGNRRR RDFFKLDNDF IIPPILLLL- --HHLLNFFF LQAAAVVAAF FNSSVRGDAD A---DVDCCV VAASSVVPKI ITTEHHNFRF TEALLFYSVF FRDDVLLAAV VNFFMK---D TTTIDARLLA AKKSSLLPRV VTTELLNFRV KNALLQFSVF FRDDVLLAAV VNFFMK---D TTTIDARLLA AKKSSLLPRV VTTELLNFRV KNALLQFSVF DSLLLDAASA ASGGNNAAEE AYRECDVCGA AAAA---REH HEEESSSRRR RTAGGLLAPP ESLLLEPPLG GRDSRRVVEE ELRRSGIGKT TTTGIIIREM MEEEEEEQQR LEAGGFFSKK ESLLLEPPLG GRDSRRVVEE ELRRSGIGKT TTTGIIIREM MEEEEEEQQR LEAGGFFSKK LGSAALLRML LVGGLLFFFS GEEGG-HHSS VVVEEEDDGC LTGGHGPLLL FFSAASWWEA LSNAAVVKIL LLWWNNYYYN YSSNNLSSII VVVEEEPPGF ISAANDPLLL LLTLLSWWR- LSNAAVVKIL LLWWNNYYYN YSSNNLSSII VVVEEEPPGF ISAANDPLLL LLTLLSWWR- ADGGGGDNNN NNNSSSNSSG NSSSSSSGGG AARDGSSSL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- TQ-------- -WWWMMDASS SSLFFLPPPV t5g66770.1 YYTTDDSSLM IIIQQQKKQE QQQQHHHHQQ INPPLLLNNP PWWW--NSGG FFLAAFPPPV t5g66770.2 YYTTDDSSLM IIIQQKKKQE QQQQHHHHQQ INPPLLLNNP PWWW--NSGG FFLAAFPPPV V--------- ---------- AAPPDDGGVY ---------- ---------- ---------- VGGGGGSNDD PPFPPPNLHH AATTTTGGGF LSDFFGGTGE EESSSDDEME TTTLISSGSD VGGGGGSNDD PPFPPPNLHH AATTTTGGGF LSDFFGGTGE EESSSDDEME TTTLISSGSD ---------D PP-----GGA D--------- ---------- ------VVVD AALPFFAAAP DDDGGDDDDD PPVVIYYGGP DPDDDTYYPP LLSSQQPPPP DDLNNRIIID SSPLPPTTLP DDDGGDDDDD PPVVIYYGGP DPDDDTYYPP LLSSQQPPPP DDLNNRIIID SSPLPPTTLP PCADDAAAAA VVVLL---AM MMRREEVVAG ---------V LLSCAAEAAA DDALAASAAA PSSLLSSIIP LLLTTHHESP PPTTEETTND DDDDFFFFPL IIDCRRSDDD DDNEAASKKL PSSLLSSIIP LLLTTHHESP PPTTEETTND DDDDFFFFPL IIDCRRSDDD DDNEAASKKL HAAALAAAVV SGGGRVVAVH FFFASSFFPP -AAPPTEEHH HHLL---HHH ACCPPLLKKF REESVSEELL T-EERVVAFY FFFASSSSPP NTTSSSTTEE EEIILLLKKK ACCPPSSKKF REESVSEELL T-EERVVAFY FFFASSSSPP NTTSSSTTEE EEIILLLKKK ACCPPSSKKF AAAAHHHFTA NNAIILLEEA HGCHHHVHHH VDFFMQQPAA LIIQQQAAAA LRRGPPPPPF AAAAHHHLTA NNAIILLEEA EKSKKKIHHH IDFFVQQPAA LLLQQQAAAA TRRGKKPPPT AAAAHHHLTA NNAIILLEEA EKSKKKIHHH IDFFVQQPAA LLLQQQAAAA TRRGKKPPPT FF-ITTIGPG RRDDDE---R RRDVVVLLDD DRRRSSSVVV FRGVAAANSD EVPMMMQQQI TTQVSSIPPE SSPPPEPPSI IIATTTNLDD DKKKVVVLLL FIPILLL-PH LLGSSSRRRV TTQVSSIPPE SSPPPEPPSI IIATTTNLDD DKKKVVVLLL FIPILLL-PH LLGSSSRRRV APPGEVANNN SVVQLHRLLL GDDPPAA--- AAVRPPPPTT VVEQEHHFLD DDRTALFFYY DPPDELANNN FMMQLYKLLL ------TIII TKLNPPPPTT LLEYELLFAN NNRKALQQFY DPPDELANNN FMMQLYKLLL ------TIII TKLNPPPPTT LLEYELLFAN NNRKALQQFY SAVDSLLDAA ASSGGGAMEA AQRREEICDD VVCCCCCGGG GAA-RRRRRR RRRHEPPLSR SAVESLLEPN GRRDEEVREE EGRRRRISGG IIGGGGGPPP KTTIHHHRRR RRRMEEEKEL SAVESLLEPN GRRDEEVREE EGRRRRISGG IIGGGGGPPP KTTIHHHRRR RRRMEEEKEL LLTTRRGLPP PPLNNAALRR QRMLLVVLFE EG---HSVEE DDLRRLFFSA SWEEAGDGGG MMEENNGFKK KKLYYAAVSS QKILLLLNYS SNLLYSIVES PPLLLLLLTL SWRR------ MMEENNGFKK KKLYYAAVSS QKILLLLNYS SNLLYSIVES PPLLLLLLTL SWRR------ GNNNNNSNSS GSDDNNSSSS NNNSSGAARR SSSSCCLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------MMMM DDTTTFQ--- -----PMMDD t5g66770.1 MYMMDDMMMA AAAQVIKKKQ QQQQQQEEEQ QQQHDDHHHH QQIIIFNPPL LSSLPP--NN t5g66770.2 MYMMDDMMMA AAAQVIKKKK QQQQQQEEEQ QQQHDDHHHH QQIIIFNPPL LSSLPP--NN AASSGGLFLL PPPAAVV--- ---------- ------DDGV GYY------- ---------- SSGGGGLAFF PPDFQVVTTG GDNNDDDPGG FPNLDHTTGG GFFLSGGGGG GGTGEEEEWW SSGGGGLAFF PPPFQVVTTG GDNNDDDPGG FPNLDHTTGG GFFLSGGGGG GGTGEEEEWW ---------- ----DPP--A ADD------- ---------- ----VDAAPP EFAAFFPPPC EISSGGGSAP CCDDDNPYIP PDDPPFDPSS LSVVQPSSSD DLRRIDSSLL PPPTWWPPPS EISSGGGSAP CCDDDNPYIP PDDPPFDPSS LSVVQPSSSD DLRRIDSSLL PPPTWWPPPS PPDAAAL--- RRRREEVVGG IIR------H LMSGGAIEEG AAASSSLLDS HHHAAAALAA PPLSSPTHEE TTKKPETTDD SSEDDDDFPK IYD--RISSS NNASSSLLQI RRRESSSVSS PPLSSPTHEE TTKKPETTDD SSEDDDDFPK IYD--RISSS NNASSSLLQI RRRESSSVSS VSAAAASSGG VAVHTTAASS FP-SVVAPPT TTDAEHFLLL -YHFFFYEEP YLFHTAQALE LGDPPPTTEE VAFYTTAASS SPNSAATSSS SSSSTELIII LYKLLLNDDP YSFHTAQALE LGDPPPTTEE VAFYTTAASS SPNSAATSSS SSSSTELIII LYKLLLNDDP YSFHTAQALE AFGCDHDDDF SMGLQQWWAA LQAALLLLAL LLRRRPPPFF FLTTGGPPPT TTGGRRE--- ATKSNHDDDF GVGIQQWWAA LQAALLLLAT TTRRRKKKTT TISSGGPLLG GGEESSESSS ATKSNHDDDF GVGIQQWWAA LQAALLLLAT TTRRRKKKTT TISSGGPLLG GGEESSESSS LDDDVVGRLL ADARVRSFVV AASLEEVVVR PWLIGEEAFN NNSSVLLQLL RRLLGDPAAA LAAATTGRLL RDAKLDDFII LTPILLLLLN GSFVDEEAVN NNFFMLLQLL KKLL-----T LAAATTGRLL RDAKLDDFII LTPILLLLLN GSFVDEEAVN NNFFMLLQLL KKLL-----T P--IIIALLL LLDVVVASVV VVRRPPPKFT VIEQEHKTTF RRTELSSSSS AFSSSDDAGG PIIVVVTLLL LLRAAAKSLL LLNNPPPRVT LGEYELRVVF RRKNLSSSSS AFSSSEEPDS PIIVVVTLLL LLRAAAKSLL LLNNPPPRVT LGEYELRVVF RRKNLSSSSS AFSSSEEPDS GAAGMAA-YY LLREEDVVVV GEGAAAAAAR EEEERHHEPL SSRRWWRRRD RRRRLTTRRG SEEERVVRLL FFRRRGIIII PEKTTTGGGR EEEERMMEEK EEQQWWRRRV LLLLMEENNG SEEERVVRLL FFRRRGIIII PEKTTTGGGR EEEERMMEEK EEQQWWRRRV LLLLMEENNG GLSSVVPPLL LSSSNNNNNA ARRRAMLLVV GLFFGGGGG- -HHSEGGGCT TGGSSEEEGG GFEEVVKKLL LNNNYYYYYA ASSSAILLLL WNYYYYNNNY YSSISGGGFS SDDSSRRR-- GFEEVVKKLL LNNNYYYYYA ASSSAILLLL WNYYYYNNNY YSSISGGGFS SDDSSRRR-- DGGDNNNSGS DSSSNGGSGK KSSAAAAARD DGGSSSCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- MDDTFPQQQ- ----WWPMAA AAAASLLLAG t5g66770.1 MAYMCCCDSS GNMMIIKQQE QQQQQQHHQQ HQQIFGNNNS SLPPWWP-SL LLLLFLLLGS t5g66770.2 MAYMCCCDSS GNMMIIKQQE QQQQQQHHQQ HQQIFGNNNS SLPPWWP-SL LLLLFLLLGS GFPPPAA--- ---------- ------APGV G--------- ---------- ---------- SADPPQQTGG GGGDDNGPPP LDHHHHATGG GRRRLLDGGG GGGGGESEEW TIISSGGGDG SAPPPQQTGG GGGDDNGPPP LDHHHHATGG GRRRLLDGGG GGGGGESEEW TIISSGGGDG -------YDA ------GD-- ---------- -----VVAAA EAAAFPPPCA PPAAAAVVL- PPPCTWWHDD YYYYVYGDFF DTYSSRRLSS VVPDVIISSS PPPLWPPPSS PPPPPPLLTH PPPCTWWHDD YYYYVYGDFF DTYSSRRLSS VVPDVIISSS PPPLWPPPSS PPPPPPLLTH MMREEVGG-- ---------V HLMSCCCCCG GAAAIIIDHH AALAQQLADD SSHHHAAAAL PPKEDTDDDD DDLEPPPPPL KIYDCCCCC- -RRRIIIDPP NNEATTLLQQ IIRRREESSV PPKEDTDDDD DDLEPPPPPL KIYDCCCCC- -RRRIIIDPP NNEATTLLQQ IIRRREESSV LAAAVSSAAA SGGIVVAATL SRRRFFPPVP PPPTEHAAF- --YYHHFFYY AAAACCCYHH VSSSLGGDDD T---VVAAEL SNNRSSPPAS SSSSTEDDLL LLYYKTLLNN AAAACCCYHH VSSSLGGDDD T---VVAAEL SNNRSSPPAS SSSSTEDDLL LLYYKTLLNN AAAACCCYHH AQQILLEAAF FFGCDHHVVI FFFFSMQGGQ QQWWPIIQQQ LRPPGGGPP- LIGGPPPSSS AQQILLEAAT TTKSNHHIIV FFFFGVQGGQ QQWWPLLQQQ LRTTGGGKPQ IIPPAAPSSS AQQILLEAAT TTKSNHHIIV FFFFGVQGGQ QQWWPLLQQQ LRTTGGGKPQ IIPPAAPSSS PRE--LLDDV VGLAADDLLL SSVVRRSFAA NSSLVRPPWW MMLLLIGGEA VAFFFNVVLL LSEPSLLAAT TGLRRDDFFF VVLLNNDFTT -PPILNGGSS SSFFFVDDEV LAVVVNMMLL LSEPSLLAAT TGLRRDDFFF VVLLNNDFTT -PPILNGGSS SSFFFVDDEV LAVVVNMMLL HHRLGGDPPA ADQQQAAP-- --DADCCVAV RPKIFFTTTE EEQEDKKGFL DFFFEAYYYS YYKL------ -DEEETTPTI IIDTRLLAKL NPRVVVTTTE EEYESRRGFA NVVVNAFFYS YYKL------ -DEEETTPTI IIDTRLLAKL NPRVVVTTTE EEYESRRGFA NVVVNAFFYS AAAVFDDSDA ASAAGGGGAN AMALRRCVVG --RRRRHPPP SRRWWRLLTR GGLSPSSLLR AAAVFEESEP PLGGDDSSER VREFRRSIIK IIHRRRMEEE EQQWWRMMEN GGFEKNNVVS AAAVFEESEP PLGGDDSSER VREFRRSIIK IIHRRRMEEE EQQWWRMMEN GGFEKNNVVS QARMLLVVFF SSGGG---HE GCCCLLLGWW HHGGPPFAAA ASAEEEAAAA GGGGDDDGDN QAKILLLLYY NNYNNLLLSE GFFFILLAWW NNDDPPLLLL LSSRRR---- ---------- QAKILLLLYY NNYNNLLLSE GFFFILLAWW NNDDPPLLLL LSSRRR---- ---------- NNNNNNNNNN NVVGSSSSDD SNNSGGGNGG GAAARSSVL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --TTQ--WDP PPAAAASSSL LDAALPPPPA t5g66770.1 MAYMCCSGGG MAAAAQQQQV QKQEQQQQQD DDIINLLWNT TTSSSLGGGL LSGGFDDDPF t5g66770.2 MAYMCCSGGG MAAAAQQQQV KKQEQQQQQD DDIINLLWNT TTSSSLGGGL LSGGFPPPPF A--------- ---------- AAAAPPPDGV ---------- ---------- ---------- FGGGGGDDSD DPPGFPFNHH AAAATTTTGG RSDFFGGGGT TGGGEFESSD EEMMMTLISS FGGGGGDDSD DPPGFPFNHH AAAATTTTGG RSDFFGGGGT TGGGEFESSD EEMMMTLISS ---------- ------YYDP A-----GDD- ---------- ---------V DAEAAAAAFF DDADDGPDCC CWWWWWHHDP DVVIIYGDDP PFTTYPSSRR RRLPSLLRVI DSPTTTTLWW DDADDGPDCC CWWWWWHHDP DVVIIYGDDP PFTTYPSSRR RRLPSLLRVI DSPTTTTLWW CCPAAAAAVL ---MREEEEG GIR------- LLLHLLLLLA GGGAGLLLAA SAALADDHAA SSPSIPPPLT HEEPTDDPED DSEDDDDDDD LLLKAAAIIA ---DSEEEAA SKKLLQQRES SSPSIPPPLT HEEPTDDPED DSEDDDDDDD LLLKAAAIIA ---DSEEEAA SKKLLQQRES AAAVSSGGGI GRRVAAVHHT TALLSSRRRL LF-SSSPPPV VAPPPPTTTT DDAAHFF--H SSSLGT---- ERRVAAFYYT TALLSSNRRL LSNSSSPPPA ATSSSSSSSS SSSSELLLST SSSLGT---- ERRVAAFYYT TALLSSNRRL LSNSSSPPPA ATSSSSSSSS SSSSELLLST EEAACCCYYK KAAHHHFFTT AQQLEEAAFG CDDVHVVIDF LMQGGLLQQQ WWQQLLLLRR DDAACCCYYK KAAHHHLLTT AQQLEEAATK SNNIHIIVDF IVQGGIIQQQ WWQQLLTTRR DDAACCCYYK KAAHHHLLTT AQQLEEAATK SNNIHIIVDF IVQGGIIQQQ WWQQLLTTRR PPPPPP--RG GIGTGE--RR DVVGLLLLAA RVVVVVRFFF FGGGAANLML QQQAAPGEAF TTKKKPQQRG GIPGEEPSII ATTGLLLLRR KLLLLLNFFF FPPPLL-ISF RRRDDPDEVV TTKKKPQQRG GIPGEEPSII ATTGLLLLRR KLLLLLNFFF FPPPLL-ISF RRRDDPDEVV FFVLLLLLRL LGAAADDDPI DDACCVAKKI IFFTTIIQEA DHKFFLLLDD RRTTAAAAYY VVMLLLLLKL L----DDDPV DDTLLAKRRV VVVTTGGYEV SLRFFAAANN RRKKAAAAYY VVMLLLLLKL L----DDDPV DDTLLAKRRV VVVTTGGYEV SLRFFAAANN RRKKAAAAYY VVFDSLLAAS ASSGGAGGNA AAAMEAAYYL IICCCDIVCC GGEGAARRHE SRRRWWWDDD VVFESLLPPL GRRDSEEERV VVVREEELLF IISSSGLIGG PPEKTTHRME EQQQWWWVVV VVFESLLPPL GRRDSEEERV VVVREEELLF IISSSGLIGG PPEKTTHRME EQQQWWWVVV RALLSAAVVP LSNNNAALRQ QAAMMLVVVG LGGEE-HVEE EDDDLTGGGW PLFFWEAGGG NAFFESSVVK LNYYYAAVSQ QAAIILLLLW NYYSSYSVEE SPPPISAAAW PLLLWR---- NAFFESSVVK LNYYYAAVSQ QAAIILLLLW NYYSSYSVEE SPPPISAAAW PLLLWR---- GGGGGGGNNN SSNVVSSSSD NSSSSNNNNS RRRDGVVCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- MFFPPPQ--- ---WPPMMMP AASSGDAAAL t5g66770.1 AATTGGMIQQ QQQVIKQQQE QQQHHHQDDD HFFGGGNPLL SLLWPP---T SLFFGSGGGF t5g66770.2 AATTGGMIQQ QQQVIKKQQE QQQHHHQDDD HFFGGGNPLL SLLWPP---T SLFFGSGGGF PPAAAVVV-- ---------- ------PPPG GGGGY----- ---------- ---------- PDFQQVVVTG GGGGDPGGFP PLLLHHTTTG GGGGFRDDDG GGGFESEEEM TLISSSVDDP PPFQQVVVTG GGGGDPGGFP PLLLHHTTTG GGGGFRDDDG GGGFESEEEM TLISSSVDDP -----YYDDD DPPPPAA--G ADD------- ---------- ----AALLPF FAAFPPAAAA DCCCDHHDDD DNNPPDDVYG PDDPFFYYSS LVQQDDDDDL NVVVSSPPLP PLLWPPSSII DCCCDHHDDD DNNPPDDVYG PDDPFFYYSS LVQQDDDDDL NVVVSSPPLP PLLWPPSSII VL--AAMREE EEEAIR---- ----LLVMMS CCCCCAEGGG DHHAALLASS AQQQQLADDS LTHHSSPKED PEENSEDDDF DDLPLLLYYD CCCCCRSSSS DPPNNEEASS KTTTTLLQQI LTHHSSPKED PEENSEDDDF DDLPLLLYYD CCCCCRSSSS DPPNNEEASS KTTTTLLQQI SALAAAVAAR VVAAAAVFTT SRRRLFFP-- VAAPPTTTAA L---YYHHHH YYAACPPPPY ISVSSELDDR VVAAAAFFEE SNNRLSSPNN ATTSSSSSSS ILLSYYKTTT NNAACPPPPY ISVSSELDDR VVAAAAFFEE SNNRLSSPNN ATTSSSSSSS ILLSYYKTTT NNAACPPPPY YLKKFFFNQA LLFFGCCCHV VHVIDMMQQL QWPLIQALAL RRPPPPF--L RIITGGIGPP YSKKFFLNQA LLTTKSSSKI IHIVDVVQQI QWPLLQALAT RRTKKPTQQI RVVSGGIPAA YSKKFFLNQA LLTTKSSSKI IHIVDVVQQI QWPLLQALAT RRTKKPTQQI RVVSGGIPAA SSRRD--RRG RLLLLADDAR RSSVVRFFFS SFRRVANSSD VRPPPPWMLQ IIVVAAFFSS SSSSPPSIIG RLLLLRDDAK KVVLLNFFFD DFIIIT-PPH LNGGGGSSFR VVLLAAVVFF SSSSPPSIIG RLLLLRDDAK KVVLLNFFFD DFIIIT-PPH LNGGGGSSFR VVLLAAVVFF LHRLLLGADD QP---IDAVV DDCCSRPKII TVVIIEHHHN KTTTGFLLDR FTTELYYSSA LYKLLL--DD EPTTIVDTAA RRLLSNPRVV TLLGGELLLN RVVVGFAANR VKKNLFYSSA LYKLLL--DD EPTTIVDTAA RRLLSNPRVV TLLGGELLLN RVVVGFAANR VKKNLFYSSA VLDAASSSAA GGAAAAAGNA AMAAEAAYLQ CDCEGAAARR RRRRRRRHPL SRRDDRTTTR VLENNLLLGG DDEEEEEERV VRVVEEELFG SGGEKGGGHH RRRRRRRMEK EQRVVLEEEN VLENNLLLGG DDEEEEEERV VRVVEEELFG SGGEKGGGHH RRRRRRRMEK EQRVVLEEEN AALLSPGSSN NQAMLLLLFF GG----HSEE ADDCCLLTTG WHGRRPPLSS SSASAWWWEA AAFFEKSNNY YQAINNNNYY YNLLLYSIES KPPFFIISSA WNDLLPPLTT TTLSSWWWR- AAFFEKSNNY YQAINNNNYY YNLLLYSIES KPPFFIISSA WNDLLPPLTT TTLSSWWWR- ADGGNNNNNS SSSSVSSSSD SNNSSSSNKS SGRDSCCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----MDDDPQ ---------W WPMMMDDAAS t5g66770.1 AAYDDDNAAI AQQVIIKQKK QQQQQQQQQQ HQQDHQQQGN LLLLNNNPPW WP---NNLLG t5g66770.2 AAYDDDNAAI AQQVIIKKKK QQQQQQQQQQ HQQDHQQQGN LLLLNNNPPW WP---NNLLG GAGFFLLPPP PAVVV----- ---------- --AAPDGGGG YY-------- ---------- GGSAAFFPPD PQVVVTGGDD DGFFPPNNHH HHAATTGGGG FFRLLSDGGG GGGESSSDEE GGSAAFFPPP PQVVVTGGDD DGFFPPNNHH HHAATTGGGG FFRLLSDGGG GGGESSSDEE ---------- ---------- -DDDP----- GGD------- ---------- DAAAEEFAAA WTTIGDSVVA GPDDDDCCCT WDDDNYYYYI GGDPPFTYYY SRLPDLNNNR DTSSPPPPTT WTTIGDSVVA GPDDDDCCCT WDDDNYYYYI GGDPPFTYYY SRLPDLNNNR DTSSPPPPTT AFPAAAAAAA AVLL----AA REEGGGIIRR ---------- HHMSSCCAGG AAIEEAGDDH LWPSSIIIPP PLTTHEEESS TDPDDDSSEE DDDFDLEPPP KKYDDCCA-- RRISSDSDDP LWPSSIIIPP PLTTHEEESS TDPDDDSSEE DDDFDLEPPP KKYDDCCA-- RRISSDSDDP AASADDSSHH AAAAAASSAS IGGVVAVVHH TTAALRLFFF ---SPVAAAA PTDAAFFFLH NNSLQQIIRR SSSSSSGGDT -EEVVAFFYY TEAALRLSSS NNNSPATTTT SSSSDLLLIK NNSLQQIIRR SSSSSSGGDT -EEVVAFFYY TEAALRLSSS NNNSPATTTT SSSSDLLLIK HHFEAYYYLL KFFHFTTAQI LLLEEEAFFH GGGHHIDDDS SLMGGGLLIA LAALLRPPGG KTLDAYYYSS KFFHLTTAQI LLLEEEATTE KKKKKVDDDG GIVGGGILLA LAATTRTTSS KTLDAYYYSS KFFHLTTAQI LLLEEEATTE KKKKKVDDDG GIVGGGILLA LAATTRTTSS GGPP-RTGII PPPPTTTGEE --RVVVLLRR AAAVVRRRRF SFFRVAAANN SDDEEVVVRP GGKKQRSGII APLLGGGEEE PSITTTNNRR RRALLDDNNF DFFIILTT-- PHHLLLLLNG GGKKQRSGII APLLGGGEEE PSITTTNNRR RRALLDDNNF DFFIILTT-- PHHLLLLLNG PWWMIPAVVF SQQQLHHLLL GPPQP---II DDASRPPKII ITIIEEQAAD DNNTTTDDRR GSSSVPVLLV FQQQLYYLLL ---EPTIIVV DDKSNPPRVV VTGGEEYVVS SNNVVVNNRR GSSSVPVLLV FQQQLYYLLL ---EPTIIVV DDKSNPPRVV VTGGEEYVVS SNNVVVNNRR RTAYSSAVSA AAASSSGGGG GAMME-AAYY QREIIDICEG GRERHPLWRR RDDRLRRGLL RKAFSSAVSP NNNLLRDSEE EVRREREELL GRRIIGLGEK KRERMEKWRR RVVLMNNGFF RKAFSSAVSP NNNLLRDSEE EVRREREELL GRRIIGLGEK KRERMEKWRR RVVLMNNGFF LSSAVVVVPS NNLLLRRARM VVGGLFFSS- HHHSVAAGCL LLLGGPFFAS AEAAAAGDGG FEESVVVVKN YYVVVSSAKI LLWWNYYNNL SSSIVKKGFI ILLDDPLLLS SR-------- FEESVVVVKN YYVVVSSAKI LLWWNYYNNL SSSIVKKGFI ILLDDPLLLS SR-------- GNNNNSSNNN GSSDSSGSSN GGKSSGGGAR DDSSSVVVV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- MTFFFFF--- --WPMAASLD t5g66770.1 YYYCCCTDSG GMMAAAQQVI IKKKKQEEQQ QQQQQQHDDD HIFFFIILSN PPWP-SLGLS t5g66770.2 YYYCCCTDSG GMMAAAQQVI IKKKKQEEQQ QQQQQQHDDD HIFFFIILSN PPWP-SLGLS AFLLLPPPAA ---------- ---------- --APPDDGGV G--------- ---------- GAFFFPDPFQ GGGDSNPFPF NNNNDHHHHH HHATTTTGGG GSDDFFGGGG FFEDDDWMII GAFFFPPPFQ GGGDSNPFPF NNNNDHHHHH HHATTTTGGG GSDDFFGGGG FFEDDDWMII ---------- ---YYY---- GGA------- -------VVA ALLPPEAAFP CAAAPDAAAA IGGSVGGGCC TWWHHHYVII GGPFDSSVQQ PSDLLNRIIS SPPLLPPPWP SSSSPLIIPP IGGSVGGGCC TWWHHHYVII GGPFDSSVQQ PSDLLNRIIS SPPLLPPPWP SSSSPLIIPP LLLL-MMMRR EEGIIR---- ---------- ----HHHHLL LMSSAAAGAE EAAALLASSA TTTTHPPPTT DPDSSEDDDD FFDDDLLLEE EPPPKKKKAA AYDDAAA-RS SDNNEEASSK TTTTHPPPTT DPDSSEDDDD FFDDDLLLEE EPPPKKKKAA AYDDAAA-RS SDNNEEASSK AQAHHAASAA ASIGVAAVFF TTAALSSRLS PVVPPPTTTA AL--HHFFYE EEAPYLLKKF KTLRRSEGPP PT-EVAAFFF TTAALSSNLS PAASSSSSSD DILSKKLLND DDAPYSSKKF KTLRRSEGPP PT-EVAAFFF TTAALSSNLS PAASSSSSSD DILSKKLLND DDAPYSSKKF HHHHFTAQQL HGGCDHVVHV IDLQGQQQQQ PIIQLARRRR PPRRIITTGP PSSPGDDE-- HHHHLTAQQL EKKSNKIIHI VDIQGQQQQQ PLLQLARRRR TPRRVVSSGA PSSLEPPEPP HHHHLTAQQL EKKSNKIIHI VDIQGQQQQQ PLLQLARRRR TPRRVVSSGA PSSLEPPEPP LLRDDLLLAA DDDLLLASSV VRVVRSSSFF FRGGVAANLE ERLLLQQAPP GAVAFFSLHL LLIAANNLRR DDDFFFAVVL LDLLNDDDFF FIPPILL-IL LNFFFRRDPP DVLAVVFLYL LLIAANNLRR DDDFFFAVVL LDLLNDDDFF FIPPILL-IL LNFFFRRDPP DVLAVVFLYL GGPDDDQPAA LLDCCAVRPP KKFFFTIEEQ EAADHNNNGF LLDRFFEAFS SDSDASSSAA ---DDDEPTT LLRLLKLNPP RRVVVTGEEY EVVSLNNNGF AANRVVNAQS SESENLLLGG ---DDDEPTT LLRLLKLNPP RRVVVTGEEY EVVSLNNNGF AANRVVNAQS SESENLLLGG SSGGAMMEEE E--AAAQQRR RRREEIDIVC GEAA--RREH HLLSRRRRDR RRLLTTGLLS RRSEVRREEE ERREEEGGRR RRRRRIGLIG PETTIIHHEM MKKEQQQRVL LLMMEEGFFE RRSEVRREEE ERREEEGGRR RRRRRIGLIG PETTIIHHEM MKKEQQQRVL LLMMEEGFFE SSVPPLGGSS NQQRVGGFSS GEEESSVVVE EEADGGWHRR PPLLLFAASA WWWWEEADGG EEVKKLSSNN YQQKLWWYNN YSSSIIVVVE EEKPAAWNLL PPLLLLLLSS WWWWRR---- EEVKKLSSNN YQQKLWWYNN YSSSIIVVVE EEKPAAWNLL PPLLLLLLSS WWWWRR---- GGDDNNNNSS SNNVGSSGGD SSSGGSSNKS SGRRDSVLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------D FFFFFQ---- WPDDDPAASS t5g66770.1 MAMCCTTTTD SGGNNNLLMA IIAAQVIKQQ QQQQQQHHHQ FIIIINPPLN WPNNNTLLGG t5g66770.2 MAMCCTTTTD SGGNNNLLMA IIAAQVIKQQ QQQQQQHHHQ FIIIINPPLN WPNNNTLLGG DDDDDAGFPP PPPAVVVV-- ---------- ------PDGY ---------- ---------- SSSSSGSAPP DDPFVVVVTT GGNDDDPFPP LDDHHHTTGF RLLLFFGGGG EEDMEETTLS SSSSSGSAPP PPPFVVVVTT GGNDDDPFPP LDDHHHTTGF RLLLFFGGGG EEDMEETTLS ---------- ----YYAA-- G--------- ---------- --VAAAAPPE EFAAAAAAAA SGGVVDDDPP PCWWHHDDIY GPTSSRLVVQ QPDLLNRRRV VVISSSSLLP PPPPTTTLLL SGGVVDDDPP PCWWHHDDIY GPTSSRLVVQ QPDLLNRRRV VVISSSSLLP PPPPTTTLLL FPPCCAAPDA AAAAAV-RRR EEEEEEEEVR ---------L VHHLMSCIDD SLLDDSHAAV WPPSSSSPLI IPPPPLHTTK EEEEPPPETE DFDDDLLPPL LKKIYDCIDD SLLQQIRESL WPPSSSSPLI IPPPPLHTTK EEEEPPPETE DFDDDLLPPL LKKIYDCIDD SLLQQIRESL VSAIIGGGRR VAAAHFFTTT AALRRLF-SV PTTAAEEHAA FLL--YHHHF EAYYLLKFHH LGD--EEERR VAAAYFFEEE AALNNLSNSA SSSSSTTEDD LIILLYKKKL DAYYSSKFHH LGD--EEERR VAAAYFFEEE AALNNLSNSA SSSSSTTEDD LIILLYKKKL DAYYSSKFHH FFTTAQQAIL FHHGDHHHVI IIIDFMLQWW ALIIQALRPG GPPF-RIIIG GIIIGPPPPP LLTTAQQAIL TEEKNKHHIV VVVDFVIQWW ALLLQALRTG GKKTQRVVVG GIIIPAPPPL LLTTAQQAIL TEEKNKHHIV VVVDFVIQWW ALLLQALRTG GKKTQRVVVG GIIIPAPPPL PTGDDDE--V GLLAADLASV VRRVVVRFFF GGVVVVAALD VVWWWWLIIE VVAFNLQQHH LGEPPPEPPT GNNRRDFAVL LDDLLLNFFF PPIIIILTIH LLSSSSFVVE LLAVNLQQYY LGEPPPEPPT GNNRRDFAVL LDDLLLNFFF PPIIIILTIH LLSSSSFVVE LLAVNLQQYY LLGDDQAAP- --IVLVAAAS SVRFTEQAAD NNTFFLLDDF ALLFFAAVVD DDSSLLLDAA LL--DETTPT IIVALAKKKS SLNVTEYVVS NNVFFAANNV ALLQQAAVVE EESSLLLEPP LL--DETTPT IIVALAKKKS SLNVTEYVVS NNVFFAANNV ALLQQAAVVE EESSLLLEPP AASAAGAGNN NMM-YQQREI IGGEGGGGAA A-RRRPPPLS SRWRRRRLLR LSAVPLLGSA NNLGGSEERR RRRRLGGRRI LPPEKKKKTG GIHRREEEKE EQWRRRLMMN FESVKLLSNA NNLGGSEERR RRRRLGGRRI LPPEKKKKTG GIHRREEEKE EQWRRRLMMN FESVKLLSNA AARVVVLFGG G---HHHHSS SVEEDDDDGC TTTGRRPPPL FSSSASAAGG GDDNNNNNSS AAKLLLNYYN NLLYSSSSII IVSSPPPPGF SSSDLLPPPL LTTTLS---- ---------- AAKLLLNYYN NLLYSSSSII IVSSPPPPGF SSSDLLPPPL LTTTLS---- ---------- NNNNVVSSSS SSGDDDSSSS SSNNGKSGGR RDSSSSCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -MDT------ WWPPMMDDPP t5g66770.1 MYTDSSNLMM IQQQQQVVVK KKKKQQQQQE QQQQQQQQHH QHQILSLLNN WWPP--NNTT t5g66770.2 MYTDSSNLMM IQQQQQVVVK KKKKQQQQQE QQQQQQQQHH QHQILSLLNN WWPP--NNTT PAAAAAGLLD GGFFLLPPPP AAAAA----- ---------- ---APPPDDG GY-------- TSSSLLGLLS SSAAFFPPDP FFFQQTTGGS NNNGFPLLLD HHHATTTTTG GFRLSDGGTG TSSSLLGLLS SSAAFFPPPP FFFQQTTGGS NNNGFPLLLD HHHATTTTTG GFRLSDGGTG ---------- ---------- -----DPAAA ---AAD---- ---------- -----VVDAA GGGEEDEMEI DSSSVADDPD DDDTWDNDDD YVIPPDPFFD YPPPLLSVQQ PLNNRIIDTS GGGEEDEMEI DSSSVADDPD DDDTWDNDDD YVIPPDPFFD YPPPLLSVQQ PLNNRIIDTS PPFFAPPPCP DAAAAVLL-- --MMREEVII IRRR------ ---LVHLLLM MSSCAIIIIE LLPPTPPPSP LSPPPLTTHE EEPPKEDTSS SEEEDDDFLL LEPLLKAAAY YDDCAIIIIS LLPPTPPPSP LSPPPLTTHE EEPPKEDTSS SEEEDDDFLL LEPLLKAAAY YDDCAIIIIS EGGGDALSSS QLLDDSSAAV SASSIVAAVH HFFTLLLSRR SSVPTTTTDD AAF-HHHYAA SSSSDNESSS TLLQQIIEEL GPTT-VAAFY YFFELLLSNN SSASSSSSSS SDLSKKTNAA SSSSDNESSS TLLQQIIEEL GPTT-VAAFY YFFELLLSNN SSASSSSSSS SDLSKKTNAA PFAAANQAII LLEAAFHHCH VHHVVIIDFL MQGQQWPPAL LALLLLRRRP GGGPPPPFFR PFAAANQAII LLEAATEESK IHHIIVVDFI VQGQQWPPAL LATTTTRRRT SSGPPPPTTR PFAAANQAII LLEAATEESK IHHIIVVDFI VQGQQWPPAL LATTTTRRRT SSGPPPPTTR ITGIIGPPPP PPPTGGRRDE -LRDDDVGGL RAADLLLAAS VVRRFSSSFA AANSSSSDEE VSGIIPAAAP PPLGEESSPE PLIAAATGGN RRRDFFFAAV LLNNFDDDFL LT-PPPPHLL VSGIIPAAAP PPLGEESSPE PLIAAATGGN RRRDFFFAAV LLNNFDDDFL LT-PPPPHLL RPWWWMMQQQ IPSSVLLLQH HGGGDDDPPD QAPP---DAA VVLDDCAAAR RRIFIEQADD NGSSSSSRRR VPFFMLLLQY Y--------D ETPPTIIDTT AALRRLKKKN NNVVGEYVSS NGSSSSSRRR VPFFMLLLQY Y--------D ETPPTIIDTT AALRRLKKKN NNVVGEYVSS HNNKTTGLLD FTTLYYSAAV FFDSAASSSA ANAMMAEE-Y LLLQEIICDC GGEGGAAERR LNNRVVGAAN VKKLYYSAAV FFESPPLLLG GRVRRVEERL FFFGRIISGG PPEKKTTERR LNNRVVGAAN VKKLYYSAAV FFESPPLLLG GRVRRVEERL FFFGRIISGG PPEKKTTERR EPLLRRRRRG GGLSSAVVPP LGGNNRAAAM GLSSG----- HHEEGCLLTT GHHRRPLSSA EEKKQRRRNG GGFEESVVKK LSSYYSAAAI WNNNYLLYYY SSEEGFIISS ANNLLPLTTL EEKKQRRRNG GGFEESVVKK LSSYYSAAAI WNNNYLLYYY SSEEGFIISS ANNLLPLTTL SAGGDGGGGG DDNNSSNSGG SSSNNGGGSN SSSRSVVVC S--------- ---------- ---------- --------- S--------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---MTPP--- -WMMAAAASS SGGGLLDGLP t5g66770.1 AACSGLLMMA AIQKKQQQQK QQEQQQQQHQ DDDHIGGPSL NW--SSSLGG FGGGLLSSFP t5g66770.2 AACSGLLMMA AIQKKKKKKK QQEQQQQQHQ DDDHIGGPSL NW--SSSLGG FGGGLLSSFP PPAAAAV--- ---------- ---------- -AADGVGY-- ---------- ---------- PDQQQQVTGG GGGGGGSSNP PNLHHHHHHH HAATGGGFLS DFGGGTGFFF FEESSSDWME PPQQQQVTGG GGGGGGSSNP PNLHHHHHHH HAATGGGFLS DFGGGTGFFF FEESSSDWME ---------- ---------- -------D-- ----GD---- ---------- -DDAAAAEAA EELSGGDDSS SSSVAADDGG DCDTTTWDYY VVVIGDPPPT PSRLLSQPNR VDDSSSSPPT EELSGGDDSS SSSVAADDGG DCDTTTWDYY VVVIGDPPPT PSRLLSQPNR VDDSSSSPPT AFPPPDALL- -REEVGIR-- -----HSSCC AAGIEAGDHH HALAAAQQAA DSSSHHLLLA TWPPPLSTTH EKDDTDSEDD FDLEPKDDCC AA-ISDSDPP PNEAKKTTLL QIIIRRVVVS TWPPPLSTTH EKDDTDSEDD FDLEPKDDCC AA-ISDSDPP PNEAKKTTLL QIIIRRVVVS AAVSAAAGIR VVVTTTTAAL LSRRRLF-SS PAPPDAEHAA FFLLL-YHHY YEEAPPYLLK EELGPPP--R FFFTTTEAAL LSNRRLSNSS PTSSSSTEDD LLIIISYKTN NDDAPPYSSK EELGPPP--R FFFTTTEAAL LSNRRLSNSS PTSSSSTEDD LLIIISYKTN NDDAPPYSSK KFAHHFFTAQ AAHCDHHHVV IIDDDFFMQQ LQWPAAIIIQ LLLALRRGGG GLRTGGGGPP KFAHHLLTAQ AAESNKKKII VVDDDFFVQQ IQWPAALLLQ LLLATRRSSG GIRSGGPPAP KFAHHLLTAQ AAESNKKKII VVDDDFFVQQ IQWPAALLLQ LLLATRRSSG GIRSGGPPAP PPTGDDDEE- ---RVVVLRR RAAAASSVVV RRRRGGGGAA AAAANSSLEE RRPLLQIIIA PLGEPPPEEP PPPITTTNRR RRRAAVVLLL NIIIPPPPLL LTTT-PPILL NNGFFRVVVD PLGEPPPEEP PPPITTTNRR RRRAAVVLLL NIIIPPPPLL LTTT-PPILL NNGFFRVVVD AAEAAAVAFF NNSVLLLLLR LPPPAAQP-- IIIDDVVLDC VVARPPKITT TVVEEADHKT DDEVVVLAVV NNFMLLLLLK L-----EPTI VVVDDAALRL AAKNPPRVTT TLLEEVSLRV DDEVVVLAVV NNFMLLLLLK L-----EPTI VVVDDAALRL AAKNPPRVTT TLLEEVSLRV FFLDFTLFFY YYYFDSLLDD DAAASGNNAA -QQRREEIDD VVCEA-ERHR WDDRRRRAAG FFANVKLQQF FFYFESLLEE EPNNLDRRVV RGGRRRRIGG IIGEGIERMQ WVVLNNNAAG FFANVKLQQF FFYFESLLEE EPNNLDRRVV RGGRRRRIGG IIGEGIERMQ WVVLNNNAAG GSAVLNALAA RMLLLFSSEG ----SEAADD CCLLLTTLLG WGRPPLLSAA AWWEAGGGDD GESVLYAVAA KILLNYNNSN LLLLISKKPP FFIIISSLLA WDLPPLLTLL SWWR------ GESVLYAVAA KILLNYNNSN LLLLISKKPP FFIIISSLLA WDLPPLLTLL SWWR------ DSSSNSSSSG SSGSGSSGGS SSSSSARRGG GGGSSCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- -----DDTFF FFF------W WPMDDDDDDP SSSSGGAGGF t5g66770.1 MMCCCTDNAA AIIAQVQQQQ QQQHDQQIFF FFISSLNNPW WP-NNNNNNT GFFFGGGSSA t5g66770.2 MMCCCTDNAA AIIAQVQQQQ QQQHDQQIFF FFISSLNNPW WP-NNNNNNT GFFFGGGSSA FFPA------ ---------- APDDGVGY-- ---------- ---------- ---------- AAPFTTTSDP FFFFLDDHHH ATTTGGGFRR SDDFGGTTGG EEFFSDWWWE ETIISGSSVA AAPFTTTSDP FFFFLDDHHH ATTTGGGFRR SDDFGGTTGG EEFFSDWWWE ETIISGSSVA ---------- -YPP---D-- ---------- ---------- ---VVVDAAA LLFAAAAAAC DDDDGPDDTT WHPPYVIDPP FDTTYSRLSS VVQQQPPSDL LLNIIIDTSS PPPPPTTLLS DDDDGPDDTT WHPPYVIDPP FDTTYSRLSS VVQQQPPSDL LLNIIIDTSS PPPPPTTLLS CDAAAAAAV- --AMRREEEV VGGGIRR--- -------HLL MSEEAAAAAL LLLADDAAAA SLSSIIIPLH HESPTKDPPT TDDDSEEDDD DFDDDDPKII YDSSDDNAAL LLLLQQESEE SLSSIIIPLH HESPTKDPPT TDDDSEEDDD DFDDDDPKII YDSSDDNAAL LLLLQQESEE VSSAAGIIGG VAAVVHFTAL RLLFSPAPPT TTTEEEA--H FACPPYLLKF ATTANQIILL LGGDP---EE VAAFFYFEAL RLLSSPTSSS SSSTTTDLSK LACPPYSSKF ATTANQIILL LGGDP---EE VAAFFYFEAL RLLSSPTSSS SSSTTTDLSK LACPPYSSKF ATTANQIILL LAGHHVVVII IFSLLMLQAA ALLLPPPGGG PPPF-LGIPP PPPSPRDDDE EE---RRGGG LAKKKIIIVV VFGIIVIQAA ALLTTTTSGG KPPTQIGIAA APPSLSPPPE EEPPSIIGGG LAKKKIIIVV VFGIIVIQAA ALLTTTTSGG KPPTQIGIAA APPSLSPPPE EEPPSIIGGG LLRLDDLAAR RSSVVRVVVR FFFFFRRRAA NSDDVRPPWW MLIAAAPPPG AAVVFVLLQL NNRLDDFAAK KVVLLDLLLN FFFFFIIILL -PHHLNGGSS SFVDDDPPPD VVLLVMLLQL NNRLDDFAAK KVVLLDLLLN FFFFFIIILL -PHHLNGGSS SFVDDDPPPD VVLLVMLLQL LHLLLGGGGD PPAVCCAASR RIFTVIIADH NNTTTGGGLL LDTTTEAFYY SAFFFFDDSL LYLLL----- --TALLKKSN NVVTLGGVSL NNVVVGGGAA ANKKKNAQFY SAFFFFEESL LYLLL----- --TALLKKSN NVVTLGGVSL NNVVVGGGAA ANKKKNAQFY SAFFFFEESL LLDASGGGGG NAMEAAYQRR EIIIICCCCV VVCGAAAA-- -RRREEEEEP PPPLLRRRWR LLENLDDDEE RVREEELGRR RIIIISSSSI IIGPTGGGII IHHHEEEEEE EEEKKQQQWR LLENLDDDEE RVREEELGRR RIIIISSSSI IIGPTGGGII IHHHEEEEEE EEEKKQQQWR DRLRGSSSAP LLGGNNARQA AMVGSGEGG- -HSSVCCLGG HGGRLLFFSS AWEADDDDGG VLMNGEEESK LLSSYYASQA AILWNYSNNL YSIIVFFLAA NDDLLLLLTT SWR------- VLMNGEEESK LLSSYYASQA AILWNYSNNL YSIIVFFLAA NDDLLLLLTT SWR------- DDNNNNSSNV VSGSSDNNSG SGKSAARDGS SSSSSVVLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- -------MTT PPFFFFFQQ- ---WWMMDPP ASSGDAFFPP t5g66770.1 AMCCCTTSSN MIQIQQQQQQ QQQQQHDHII GGIIIIINNL NNPWW--NTT SFFGSGAAPP t5g66770.2 AMCCCTTSSN MIQIKQQQQQ QQQQQHDHII GGIIIIINNL NNPWW--NTT SFFGSGAAPP PPPPPAAA-- ---------- -------PPP DDDDGGG--- ---------- ---------- PDPPPFFQTT GGGDDSNNPG FFFNHHHTTT TTTTGGGRRL LSFGGGGGGG GGEFESDDDD PPPPPFFQTT GGGDDSNNPG FFFNHHHTTT TTTTGGGRRL LSFGGGGGGG GGEFESDDDD ---------- ---------- ----YYDDPP A---D----- ---------- -------VDD WWWMTLLLIS GGGSVAAAAD DGCWHHDDNP DVYYDPPPPP FFFDTYPPRS VVVSDRVIDD WWWMTLLLIS GGGSVAAAAD DGCWHHDDNP DVYYDPPPPP FFFDTYPPRS VVVSDRVIDD AALLLPAAAA AFPPCPAAAA AAAALLL-AM MMREEAAG-- ---------L VHLLSSSAAI TTPPPLPLLL LWPPSPSSSI IPPPTTTESP PPKPPNNDDF LLEEEPPPPL LKAIDDDARI TTPPPLPLLL LWPPSPSSSI IPPPTTTESP PPKPPNNDDF LLEEEPPPPL LKAIDDDARI IIEEAGDHAS SSQQQLAASH HHHAALLASA GIIGRVVFTS SRLLLF-PPP PDAEEAAFL- IISSDSDPAS SSTTTLLLIR RRREEVVSGD ---ERFFFES SRLLLSNPPS SSSTTDDLIL IISSDSDPAS SSTTTLLLIR RRREEVVSGD ---ERFFFES SRLLLSNPPS SSSTTDDLIL -YHHFYYAAC CPYLKKFHHT TTQLLAAHGD DHVIDFSLMM QQPPALLPGP F-LLITTGGI SYKTLNNAAC CPYSKKFHHT TTQLLAAEKN NKIVDFGIVV QQPPALLTGP TQIIVSSGGI SYKTLNNAAC CPYSKKFHHT TTQLLAAEKN NKIVDFGIVV QQPPALLTGP TQIIVSSGGI IGGPSPGGDD ----LDDVGG GLLLDLARVV RRRVVVRFFF FRRRGGNLVP PPPPMLLQAP IPPPSLEEPP SSSSLAATGG GLLLDFAKLL DDDLLLNFFF FIIIPP-ILG GGGGSFFRDP IPPPSLEEPP SSSSLAATGG GLLLDFAKLL DDDLLLNFFF FIIIPP-ILG GGGGSFFRDP AVFFFNNNNS SLHHRLDADQ QIAALDCCVA SVVVRRPPPK KKFTTTVVEQ QEEDDHNNFF VLVVVNNNNF FLYYKL--DE EVTTLRLLAK SLLLNNPPPR RRVTTTLLEY YEESSLNNFF VLVVVNNNNF FLYYKL--DE EVTTLRLLAK SLLLNNPPPR RRVTTTLLEY YEESSLNNFF DFALYYAADL DDAAASSGAG AE--AQIIIC GGGA-REERE PPPLRRLALV VVVGSAQQQQ NVALFFAAEL EEPNGRRDEE VERREGLLLG PPKTIREERE EEEKLLMAFV VVVSNAQQQQ NVALFFAAEL EEPNGRRDEE VERREGLLLG PPKTIREERE EEEKLLMAFV VVVSNAQQQQ QRMMFFSSGE EGG-HHHHHS SSEEEADGCL LTGGWHHGGG RSAASSSAAA AEAGGDGGGG QKIIYYNNYS SNNLSSSSSI IISSSKPGFI ISAAWNNDDD LTLLSSSSSS SR-------- QKIIYYNNYS SNNLSSSSSI IISSSKPGFI ISAAWNNDDD LTLLSSSSSS SR-------- GGNNSNNSSN SSSGDSGGSS SSNSGARRRD DGSSSSVVL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -----MTFF- -----WPPDP AAASGDAAGG t5g66770.1 ASGNNNLLMA AIIIAAAQQQ QQVVVQQQEQ HHHHDHIFIP PLSNPWPPNT SSSFGSGGSS t5g66770.2 ASGNNNLLMA AIIIAAAQQQ QQVVVQQQEQ HHHHDHIFIP PLSNPWPPNT SSSFGSGGSS FFFPPPPPAA AAAV------ ---------- -----ADDGV GGGYY----- ---------- AAAPPDDDFF FQQVTTGNDP GPFFPPNLDD HHHHHATTGG GGGFFRLSDD GGTTGGGGGW AAAPPPPPFF FQQVTTGNDP GPFFPPNLDD HHHHHATTGG GGGFFRLSDD GGTTGGGGGW ---------- ---------- ---------- -PPA-AD--- ---------- ----VVAAAA WMMMMTTTLI IISGDDDDAA DPPDCCCCDW WPPDIPDFFD DPSRSSSVVQ DDVVIISSSS WMMMMTTTLI IISGDDDDAA DPPDCCCCDW WPPDIPDFFD DPSRSSSVVQ DDVVIISSSS ALLLPFFAAA APCPDDAAAA AA----MMRR RRREVVAGGG GRR----VLL MGAIEAHALL SPPPLPPPPL LPSPLLSSII PPHHEEPPTK KKKETTNDDD DEEDFFELAA Y-RISDPNEE SPPPLPPPPL LPSPLLSSII PPHHEEPPTK KKKETTNDDD DEEDFFELAA Y-RISDPNEE SSSAAQLHAA ASAASSGRRV VVHHFTLLSR LFFFFPSSSP AAAPTTDAAA FFFLL--HHE SSSKKTLRES SGDDTTERRV VFYYFTLLSN LSSSSPSSSP TTTSSSSSSD LLLIILLKKD SSSKKTLRES SGDDTTERRV VFYYFTLLSN LSSSSPSSSP TTTSSSSSSD LLLIILLKKD EAACCYYKAA ILEAHGCCDD VVVHVVIFSL MMGLQWPIQL LRRRRRGGGF LLITGIPPPP DAACCYYKAA ILEAEKSSNN IIIHIIVFGI VVGIQWPLQT TRRRRRSSGT IIVSGIAPPP DAACCYYKAA ILEAEKSSNN IIIHIIVFGI VVGIQWPLQT TRRRRRSSGT IIVSGIAPPP SPTDD--RRR VGGLRRLLLL ARRVRVRSSR GGVVAASDEE RRLLQQIAPP GEEEAAVVVL SLGPPPSIII TGGNRRFFFF AKKLDLNDDI PPIITTPHLL NNFFRRVDPP DEEEVVLLML SLGPPPSIII TGGNRRFFFF AKKLDLNDDI PPIITTPHLL NNFFRRVDPP DEEEVVLLML HLLLLGDDDP PPD-----IV VVLSVVRPPK TTTVIEQADD NKKTTGFFLL LDRRRTELLF YLLLL----- --DIIIIIVA AALSLLNPPR TTTLGEYVSS NRRVVGFFAA ANRRRKNLLQ YLLLL----- --DIIIIIVA AALSLLNPPR TTTLGEYVSS NRRVVGFFAA ANRRRKNLLQ YYYAADSSSG GGGAGAAMMM AEE--YYYCC IICGGEEGRR EEHEPPLSRW WRLLRAALSP FYYAAELRRD DDSEEVVRRR VEERRLLLSS LLGPPEEKRR EEMEEEKEQW WRMMNAAFEK FYYAAELRRD DDSEEVVRRR VEERRLLLSS LLGPPEEKRR EEMEEEKEQW WRMMNAAFEK LLGGNNNALL LLRRQAAAAM MMLGFGG-HH VEEADDGGGL LLLWHRRRLL LLLFFSASSA LLSSYYYAVV VVSSQAAAAI IILWYYNYSS VEEKPPGGGI LLLWNLLLLL LLLLLTLSSS LLSSYYYAVV VVSSQAAAAI IILWYYNYSS VEEKPPGGGI LLLWNLLLLL LLLLLTLSSS WWAGDGNNNN SSSNVSGSSG SNNSSSSNNN NSAGSSVLL WW-------- ---------- ---------- --------- WW-------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------MDDD DDTPQQ---- --WWWPDALL t5g66770.1 MAAAAYYCTT DDSGNLLAAV KKQQQQQQQQ QQQQQQHQQQ QQIGNNPPLL NNWWWPNLLL t5g66770.2 MAAAAYYCTT DDSGNLLAAV KKQQQQQQQQ QQQQQQHQQQ QQIGNNPPLL NNWWWPNLLL LDGFFLLPPP PPPPPAAAAA AVV------- ---------G GY-------- ---------- LSSAAFFPPP DDPPPFFFQQ QVVTGDDNDP GGGPNNHHHG GFRRRLSDFG GGEEWWMMET LSSAAFFPPP PPPPPFFFQQ QVVTGDDNDP GGGPNNHHHG GFRRRLSDFG GGEEWWMMET ---------- ---------- DPA-----AD ---------- -----AALLL PFAAAAPCAA LLLLLISGGG AGGPDCCTWW DPDYYIYYPD PPPFFTTTYS LNRRVTSPPP LPPPTLPSSS LLLLLISGGG AGGPDCCTWW DPDYYIYYPD PPPFFTTTYS LNRRVTSPPP LPPPTLPSSS AAAAV---RR RREVAGIIIR ---LVVLLSC CGAIIIIEAA DALLAAQQLD HHHAAALLLA SIPPLHEETT KKPTNDSSSE DDPLLLAADC C-RIIIISDD DNEEAATTLQ RRRSSSVVVS SIPPLHEETT KKPTNDSSSE DDPLLLAADC C-RIIIISDD DNEEAATTLQ RRRSSSVVVS AVVSAAAASI IAVHHTTTLL SRRPSSSSVV PPTDAAEHLL ----YFYEAP LFAHAQQALL ELLGDPPPT- -AFYYTTTLL SNNPSSSSAA SSSSSSTEII LSSSYLNDAP SFAHAQQALL ELLGDPPPT- -AFYYTTTLL SNNPSSSSAA SSSSSSTEII LSSSYLNDAP SFAHAQQALL EAFHGCCHHV VSLLQQQQQL LLQQPAAAAL IIIQAAALLP GP-LRRRRRI ITGGGGSSTG EATEKSSKHI IGIIQQQQQI IIQQPAAAAL LLLQAAALLT GKQIRRRRRV VSGGPPSSGE EATEKSSKHI IGIIQQQQQI IIQQPAAAAL LLLQAAALLT GKQIRRRRRV VSGGPPSSGE GE---LLDDV VGLLLADDDL LRRVRVFFAA NNLDEEVVRR RRRPPPPMML QQAAAPPAVA EEPSSLLAAT TGNNNRDDDF FKKLDLFFTT --IHLLLLNN NNNGGGGSSF RRDDDPPVLA EEPSSLLAAT TGNNNRDDDF FKKLDLFFTT --IHLLLLNN NNNGGGGSSF RRDDDPPVLA AFFSVVLQQQ QHLLLLDPPA QA--DAVVVV ASVRRRKTTI EEEEEHNNNK GFFDDFFTEE AVVFMMLQQQ QYLLLL---- ETIIDTAAAA KSLNNNRTTG EEEEELNNNR GFFNNVVKNN AVVFMMLQQQ QYLLLL---- ETIIDTAAAA KSLNNNRTTG EEEEELNNNR GFFNNVVKNN LLFYYYYSVS LLDDDAAAAS GGGGANNAAA A--LRREEEI ICCIVERRRE RRHEEEELSS LLQFFYYSVS LLEEEPPPGR DDSSERRVVV VRRFRRRRRI ISSLIEHHRE RRMEEEEKEE LLQFFYYSVS LLEEEPPPGR DDSSERRVVV VRRFRRRRRI ISSLIEHHRE RRMEEEEKEE RRRRTGLLSA VLGSNAAQQR MVVGLLLGG- --HHSVVVAD DGCCCTLLHH GRPFSSAWWE QQRLEGFFES VLSNYAAQQK ILLWNNNYNY YYSSIVVVKP PGFFFSLLNN DLPLTSSWWR QQRLEGFFES VLSNYAAQQK ILLWNNNYNY YYSSIVVVKP PGFFFSLLNN DLPLTSSWWR AAGGDDGGGN NSNNNSSGGS DNNSSSSNKS GGGDGVVCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------M TFFPPQ---- ---WMDPPAA ASSSSGGDFF t5g66770.1 MMAYYMMCSG GAQQQIIIIQ KQQQQHDDDH IFFGGNPPPS LLLW-NTTSL LFFFFGGSAA t5g66770.2 MMAYYMMCSG GAQQQIIIIK KQQQQHDDDH IFFGGNPPPS LLLW-NTTSL LFFFFGGSAA LLLPPPPPP- ---------- ---------- ------AAGV VVVGY----- ---------- FFFDPPPPPT TTGGGSDDPG GGFFPPFFPN NLLHHHAAGG GGGGFRDGGG GEEEDEWEET FFFPPPPPPT TTGGGSDDPG GGFFPPFFPN NLLHHHAAGG GGGGFRDGGG GEEEDEWEET ---------- ---YYDPPA- ---AAAD--- ---------- VAAAAAFAAA FPPPCAPPPA LSSGGDDAAD DDWHHDNNDV VYYPPPDFYP RRSSVQPSNV ITTSSSPPTL WPPPSSPPPS LSSGGDDAAD DDWHHDNNDV VYYPPPDFYP RRSSVQPSNV ITTSSSPPTL WPPPSSPPPS AAAL-AMRRR EEEVVVAGIR ------LVLL SAAGAIIIEE AAAGGHAAAA AAAHAAAAAA IPPTESPTTT DPETTTNDSE DFLEEPLLAA DAA-RIIISS DDDSSPNNNA AKKREEEESS IPPTESPTTT DPETTTNDSE DFLEEPLLAA DAA-RIIISS DDDSSPNNNA AKKREEEESS AAVVSSSAAS IIGGRVHLLS LLLLLFFP-- VVVTEEEHAL LL------YY HHHEAPPLLF SSLLGGGDPT --EERVYLLS LLLLLSSPNN AAASTTTEDI IILLSSSSYY KKTDAPPSSF SSLLGGGDPT --EERVYLLS LLLLLSSPNN AAASTTTEDI IILLSSSSYY KKTDAPPSSF FAHANNLEEA HGCCDVVDFF FLMQWWWWIQ QAAALLRRPG GGPF----LL RIIPPPTRRE FAHANNLEEA EKSSNIIDFF FIVQWWWWLQ QAAALLRRTS SSKTQQQQII RVIALLGSSE FAHANNLEEA EKSSNIIDFF FIVQWWWWLQ QAAALLRRTS SSKTQQQQII RVIALLGSSE RDDVGRAALA AARSVVVRVV SFFGVAAAVR RMMLQQQGEE VFNSVLLHRL GDAQAAA-II IAATGRRRFA AAKVLLLDLL DFFPILLLLN NSSFRRRDEE LVNFMLLYKL ---ETTTIVV IAATGRRRFA AAKVLLLDLL DFFPILLLLN NSSFRRRDEE LVNFMLLYKL ---ETTTIVV DAVVVLLLDC CCVPKIITTV IIQEADDDHH HNNKKTFFLL DDRELLLFYS VVFDSSSLDD DTAAALLLRL LLAPRVVTTL GGYEVSSSLL LNNRRVFFAA NNRNLLLQYS VVFESSSLEE DTAAALLLRL LLAPRVVTTL GGYEVSSSLL LNNRRVFFAA NNRNLLLQYS VVFESSSLEE AASAASGGGG NNAEAALLQE CVGGGEGAAR REEERRHEPP LLLLLRRRDD DDLLRAAAGG PNLGGRDDSS RRVEEEFFGR SIPPPEKTGR REEERRMEEE KKKKKQRRVV VVMMNAAAGG PNLGGRDDSS RRVEEEFFGR SIPPPEKTGR REEERRMEEE KKKKKQRRVV VVMMNAAAGG GLSPPPLGGG NNNARQLLLG FSSSGE--HH SVVVEEEEEA ADGCCTGWHH HPPLSASSWA GFEKKKLSSS YYYASQLLLW YNNNYSYYSS IVVVESSSSK KPGFFSAWNN NPPLTLSSW- GFEKKKLSSS YYYASQLLLW YNNNYSYYSS IVVVESSSSK KPGFFSAWNN NPPLTLSSW- AAAGGGGNNN SSSNVGGSSS SDSSSSNNGG GKSRGSSLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ---MMFFQ-- -----PDPPA t5g66770.1 YMDSSGNNLL MAIAQQQVVI KQKQQQQQQE EQQQQQQHHH HDDHHIINPP LLLLNPNTTL t5g66770.2 YMDSSGNNLL MAIAQQQVVI KKKQQQQQQE EQQQQQQHHH HDDHHIINPP LLLLNPNTTL GGDGLLPPPP AAVV------ ---------P PDDDVVY--- ---------- ---------- GGSSFFPPPP FQVVGGDNDP GPNLDHHHHT TTTTGGFRFF GGGTGGGGEF EMMTTISGAA GGSSFFPPPP FQVVGGDNDP GPNLDHHHHT TTTTGGFRFF GGGTGGGGEF EMMTTISGAA -------Y-- -AA------- ---------- ---DALLLAA AAAPDDAAAA AAVV-AMMMM ADGDCWWHVI YPPDDDDTYP PRLLVPDLNR RRVDTPPPPP TTTPLLSSII IPLLESPPPP ADGDCWWHVI YPPDDDDTYP PRLLVPDLNR RRVDTPPPPP TTTPLLSSII IPLLESPPPP MRRREEEEEA GGIII----- -----HHGGI GGGGAAASSS QADDALAAAV ASSGIGRRRV PTTKDPEEEN DDSSSDDFFL EPPPPKK--I SSSSNAASSS TLQQEVEEEL PTT--ERRRV PTTKDPEEEN DDSSSDDFFL EPPPPKK--I SSSSNAASSS TLQQEVEEEL PTT--ERRRV VAAAHHTTAA SLP--PPPTT TTTDAH-YYH FYPLKKAAHH HTANQQQQAA ILLEHHGVVV VAAAYYTEAA SLPNNSSSSS SSSSSELYYT LNPSKKAAHH HTANQQQQAA ILLEEEKIII VAAAYYTEAA SLPNNSSSSS SSSSSELYYT LNPSKKAAHH HTANQQQQAA ILLEEEKIII IDFSSSLQGG LLQAAIIIAA LLRGGGGGGP FFF-TGIIGP PPPGGRE-LL RDVVVLLRRR VDFGGGIQGG IIQAALLLAA LTRSSSSGGP TTTQSGIIPA APLEESESLL IATTTNNRRR VDFGGGIQGG IIQAALLLAA LTRSSSSGGP TTTQSGIIPA APLEESESLL IATTTNNRRR LADLRRSSVR RFFGVAAANN NSSLEPQQQI IPGVASSVVL LLHRLGGDPP AAADDDDDQQ LRDFKKVVLN NFFPILTT-- -PPILGRRRV VPDLAFFMML LLYKL----- ---DDDDDEE LRDFKKVVLN NFFPILTT-- -PPILGRRRV VPDLAFFMML LLYKL----- ---DDDDDEE AAPPPP---- IDAVLCVARP PIFFTVIEQQ AADNKTTFFT TAAFFFFYAV FSLLDDAAAA TTPPPPTTII VDTALLAKNP PVVVTLGEYY VVSNRVVFVK KAAQQQQFAV FSLLEEPPNG TTPPPPTTII VDTALLAKNP PVVVTLGEYY VVSNRVVFVK KAAQQQQFAV FSLLEEPPNG ASSGGGAGNN A-AAYYQQQE CDIIIVVCCC GEGAAAA--- RREHEEPPSS RWRLLTRRAG GRRDSSEERR VREELLGGGR SGLLLIIGGG PEKTTGGIII HHEMEEEEEE QWLMMENNAG GRRDSSEERR VREELLGGGR SGLLLIIGGG PEKTTGGIII HHEMEEEEEE QWLMMENNAG AAVVLGGNLL QAMMVGLLFS SGGE-SVVAD GGLLTWWHPL LFSASSWWEA GGGGGGGDDD SSVVLSSYVV QAIILWNNYN NYYSLIVVKP GGIISWWNPL LLTLSSWWR- ---------- SSVVLSSYVV QAIILWNNYN NYYSLIVVKP GGIISWWNPL LLTLSSWWR- ---------- NNNNNNSNSS NSGGSSSSSS DDDDDSSNNS SSNGGSSVC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----PQ---- --WWPAAASS GGGGLDDAAG t5g66770.1 AAACTTDDSS GGLMAQQQQV VIIKKKQQQQ QHQDGNPPPL SLWWTSSLGG GGGGLSSGGS t5g66770.2 AAACTTDDSS GGLMAQQQQV VIIKKKQQQQ QHQDGNPPPL SLWWTSSLGG GGGGLSSGGS GGFPAVV--- ---------- ------AAAP DDDGY----- ---------- ---------- SSADFVVGGG GGSSNDPFFF PNLDHHAAAT TTTGFRLLDD FFGGGTGGGG GGEESDDWEE SSAPFVVGGG GGSSNDPFFF PNLDHHAAAT TTTGFRLLDD FFGGGTGGGG GGEESDDWEE ---------- -------YDP ---------- G--------- ---------- ----VVDDAA LIIISGGSVA ADGDDTWHDP YYYVVIIYYY GPPFDTTTPS SLQQQQQPPS DNNRIIDDTT LIIISGGSVA ADGDDTWHDP YYYVVIIYYY GPPFDTTTPS SLQQQQQPPS DNNRIIDDTT PPPEAAAAAF FPCADAAA-- MRRREEEEEA GII------- -LLHHLLMAG IEAGGDDHHH LLLPPLLLLW WPSSLSIPHH PTTKEPEEEN DSSDDDLLEP PLLKKAIYA- ISDSSDDPPP LLLPPLLLLW WPSSLSIPHH PTTKEPEEEN DSSDDDLLEP PLLKKAIYA- ISDSSDDPPP ALLQLADDHH ALAAVSASSI IGRVVVVAVV HFFTRRLLPP SPVAPTTAAA FL-YHHEAPP NEETLLQQRR SVSSLGPTT- -ERVVVVAFF YFFENNLLPP SPATSSSDDD LILYTTDAPP NEETLLQQRR SVSSLGPTT- -ERVVVVAFF YFFENNLLPP SPATSSSDDD LILYTTDAPP PPYYLLLLFA AAHAANALEE AFFHGCDVDF LLLMQGLLQA LLLIALAAAR RRGGLLRRGP PPYYSSSSFA AAHAANALEE ATTEKSNIDF IIIVQGIIQA LLLLALAAAR RRSGIIRRPA PPYYSSSSFA AAHAANALEE ATTEKSNIDF IIIVQGIIQA LLLLALAAAR RRSGIIRRPA SPPTRDDE-- RDGGLADLVV RVRFFSSFRG GVSLLDVVVR RRQQIIIAGA ASQHRLLGDD SLLGSPPESS IAGGNRDFLL DLNFFDDFIP PIPIIHLLLN NNRRVVVDDA AFQYKLL--- SLLGSPPESS IAGGNRDFLL DLNFFDDFIP PIPIIHLLLN NNRRVVVDDA AFQYKLL--- PAADA-IIID ALLDDVVAAS VVRPIFFFTQ QEAAADHKTG FDFTTEEAAL LFYYYSSAVF ---DTIVVVD TLLRRAAKKS LLNPVVVVTY YEVVVSLRVG FNVKKNNAAL LQYYYSSAVF ---DTIVVVD TLLRRAAKKS LLNPVVVVTY YEVVVSLRVG FNVKKNNAAL LQYYYSSAVF DDDLGGGNNA MA--AAYLQR RREIDIVCGG EEAA--RRRR EHHEEPSSRD TTALLAPLLL EEELDDERRV RVRREELFGR RRRIGLIGPP EETTIIHRRR EMMEEEEEQV EEAFFSKLLL EEELDDERRV RVRREELFGR RRRIGLIGPP EETTIIHRRR EMMEEEEEQV EEAFFSKLLL LLGLRRRRAR RMMMLVVFFS EEG--HSVVE EAGCGGPLFS WEAGGDDDGG GGGGDDDNNN LLSVSSSSAK KIIILLLYYN SSNYYSIVVE SKGFAAPLLT WR-------- ---------- LLSVSSSSAK KIIILLLYYN SSNYYSIVVE SKGFAAPLLT WR-------- ---------- NSNSSSGGGG SSSDDSSSNG SSGSSSSGRD SSSSSSCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -DTFPPFQQ- ------WWMD AAASLAGFLP t5g66770.1 MMTDSGMAAI AAAAQQVIQQ QEQQQQQQQQ QQIFGGINNP PSSNNPWW-N SLLFLGSAFP t5g66770.2 MMTDSGMAAI AAAAQQVIQQ QEQQQQQQQQ QQIFGGINNP PSSNNPWW-N SLLFLGSAFP PA-------- ---------- ---AAPDDGG GY-------- ---------- ---------- DQTGGGGSNN DPPPFPLDHH HHHAATTTGG GFLSDGGGGG TTGGGGFESS SDDDEIGDDV PQTGGGGSNN DPPPFPLDHH HHHAATTTGG GFLSDGGGGG TTGGGGFESS SDDDEIGDDV ------PPPP -----AD--- ---------- ----DAALEE FFAAAAFFPC CPDDAAAAL- AADCDDNNPP VIIYYPDPPD TYSSLLSVPP NNRVDTSPPP PPPPTLWWPS SPLLIIPPTH AADCDDNNPP VIIYYPDPPD TYSSLLSVPP NNRVDTSPPP PPPPTLWWPS SPLLIIPPTH ---RRRREEE VGGGIIR--- --------LV HLLLMMCAGA ADDHAQLDHH AAAAAAAAAV HEETTTKEDE TDDDSSEDDD DDDFDEEPLL KAIIYYCA-D DDDPNTLQRR EEESSEEEEL HEETTTKEDE TDDDSSEDDD DDDFDEEPLL KAIIYYCA-D DDDPNTLQRR EEESSEEEEL VVASRVVVVF TTALSSSSSL PP-SPVAAPT AEHFFYHHHF FFYYEECPKK KAHHFFTAAN LLDTRVVVFF EEALSSSSSL PPNSPATTSS STELLYKTTL LLNNDDCPKK KAHHLLTAAN LLDTRVVVFF EEALSSSSSL PPNSPATTSS STELLYKTTL LLNNDDCPKK KAHHLLTAAN NQQAIEFFCH HHDSLLMMML QWPLAAALRR PFLIGPSPGG REEELRDVRL DLAVVVRRFF NQQAIETTSH HHDGIIVVVI QWPLAAATRR PTIIPPSLEE SEEELIATRL DFALLLNNFF NQQAIETTSH HHDGIIVVVI QWPLAAATRR PTIIPPSLEE SEEELIATRL DFALLLNNFF FFFFFGAANN SSSDEEVVRP PWWMMLLQIG GEAAVVNSVV LLHHLLLDDP PAQQP-IACV FFFFFPLL-- PPPHLLLLNG GSSSSFFRVD DEVVLLNFMM LLYYLLL--- --EEPIVTLA FFFFFPLL-- PPPHLLLLNG GSSSSFFRVD DEVVLLNFMM LLYYLLL--- --EEPIVTLA VSSVVKKTVE QQEAADDLLD DAFFYSAVFF DLLLDAAASS GGGGAAAGGG NAMAE--AYQ ASSLLRRTLE YYEVVSSAAN NAQQYSAVFF ELLLENGGRR DSSSEEEEEE RVRVERRELG ASSLLRRTLE YYEVVSSAAN NAQQYSAVFF ELLLENGGRR DSSSEEEEEE RVRVERRELG ICDVVCGAAA A--REEEERE EEPLLRRRRR LRRAAGGSSS AAPLLGGSSS AQQQARRVVL ISGIIGPTTG GIIHEEEERE EEEKKRRRLL MNNAAGGEEE SSKLLSSNNN AQQQAKKLLN ISGIIGPTTG GIIHEEEERE EEEKKRRRLL MNNAAGGEEE SSKLLSSNNN AQQQAKKLLN LFFFFSGEG- -HSVVEAAAD GLLTLGHHRR PLFFSSSSWW EEAAAGGGDG GDDDNNNNSS NYYYYNYSNL YSIVVEKKKP GIISLANNLL PLLLTTTSWW RR-------- ---------- NYYYYNYSNL YSIVVEKKKP GIISLANNLL PLLLTTTSWW RR-------- ---------- NSSSNSSSSS SGGSSDNSGS SGKSSGGAAD DGSSSSVLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------M DT-------- -MDPPAASSS SSLDDDDALP t5g66770.1 AYMCCTTDDS GNMMAQVVIQ QEQQQHHHQH QIPLLLNNPP P-NTTSLGGG GFLSSSSGFD t5g66770.2 AYMCCTTDDS GNMMAQVVIK QEQQQHHHQH QIPLLLNNPP P-NTTSLGGG GFLSSSSGFP ---------- -------DVY Y--------- ---------- ---------- ---------- GGGGSGFFFP NLDDHHHTGF FLSDFFFFGG GGGGEEEDDE EMTTISSGGD SSSSVAAAGG GGGGSGFFFP NLDDHHHTGF FLSDFFFFGG GGGGEEEDDE EMTTISSGGD SSSSVAAAGG ---------- ---YDPPA-- ----AADD-- ---------- ---------- -AAAAALPFF PPDDCCDDDD TWWHDNNDVV IIIYPPDDPF FFDDYYPSSR RSPPSLLNNR RTTTTSPLPP PPDDCCDDDD TWWHDNNDVV IIIYPPDDPF FFDDYYPSSR RSPPSLLNNR RTTTTSPLPP AAAAAFPPCA DAAAAVLL-- AMRRREEVAG GIRR------ ---HLLMMMA AGGAIEEADA TLLLLWPPSS LIPPPLTTHE SPTKKEDTND DSEEDDDDDL LPPKIIYYYA A--RISSDDN TLLLLWPPSS LIPPPLTTHE SPTKKEDTND DSEEDDDDDL LPPKIIYYYA A--RISSDDN LAAQQSSHLL AVVAAASSGR RVAAHLSRRR LLPSPPVAAP TDAEL--YYH HFFFFYACCC EKKTTIIRVV ELLDDPTTER RVAAYLSNNR LLPSPPATTS SSSTISSYYK TLLLLNACCC EKKTTIIRVV ELLDDPTTER RVAAYLSNNR LLPSPPATTS SSSTISSYYK TLLLLNACCC PPLKKHFTAN NNNEEFHGDH HVHHIDFLLM QLLLQPALIA AARRRRRRPP PPGFFFLRTT PPSKKHLTAN NNNEETEKNK KIHHVDFIIV QIIIQPALLA AARRRRRRTT TTGTTTIRSS PPSKKHLTAN NNNEETEKNK KIHHVDFIIV QIIIQPALLA AARRRRRRTT TTGTTTIRSS GGIIPSTTRR E---LLDVGL RLDDDRSVVV RFSFFVAADD EVVRPWMMML QIAPPGGEAN GGIIASGGSS EPPSLLATGN RLDDDKVLLL NFDFFILLHH LLLNGSSSSF RVDPPDDEVN GGIIASGGSS EPPSLLATGN RLDDDKVLLL NFDFFILLHH LLLNGSSSSF RVDPPDDEVN NLHLGGGDAA PP-DDDVDCV AAAVVRRPII FIQEADHHNK TTTDRFTEEE AFYYAVFAAA NLYL-----T PPTDDDARLA KKKLLNNPVV VGYEVSLLNR VVVNRVKNNN AQFFAVFNNN NLYL-----T PPTDDDARLA KKKLLNNPVV VGYEVSLLNR VVVNRVKNNN AQFFAVFNNN SSSGGGGGGN NAAAAAAAE- --AAYYLQEE IIVRRREHLL LSSRWRDDLL LTRRGLLLSA LRRDDDEEER RVVVVVVVER RREELLFGRR ILIHHHEMKK KEEQWRVVMM MENNGFFFES LRRDDDEEER RVVVVVVVER RREELLFGRR ILIHHHEMKK KEEQWRVVMM MENNGFFFES PNNLLQAARM LLGGLLFGG- --SVVEADCT TGWHGGRRPL LASAAWWEEA AGGGDDGGDD KYYVVQAAKI LLWWNNYNNL LYIVVSKPFS SAWNDDLLPL LLSSSWWRR- ---------- KYYVVQAAKI LLWWNNYNNL LYIVVSKPFS SAWNDDLLPL LLSSSWWRR- ---------- DNNSSSSNNS SSGSSGDDSN NNSGSSKKGA DDSSSSLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------M MPPF------ --PMDDDDPS t5g66770.1 AYCCTDDLLM MMAIAQQQQQ IKKKKQQQQQ QQQHHHQDDH HGGIPLLSLL LNP-NNNNTG t5g66770.2 AYCCTDDLLM MMAIAQQQQQ IKKKKQQQQQ QQQHHHQDDH HGGIPLLSLL LNP-NNNNTG SLDDAGGPPV ---------- --------PD DDDD------ ---------- ---------- FLSSGSSPPV TGGGGGDSND DPGFFPLDTT TTTTRDGGGG TGGGGGGGEE FFSDDMETIS FLSSGSSPPV TGGGGGDSND DPGFFPLDTT TTTTRDGGGG TGGGGGGGEE FFSDDMETIS ---------- ----DPAAA- --AD------ ---------- -VAAAAAEFF FFAAAFFPPP GGGGGGDDGD CDDTDPDDDY YYPDYYYPRL SVVPSSLNVV VITTSSSPPP PPTLLWWPPP GGGGGGDDGD CDDTDPDDDY YYPDYYYPRL SVVPSSLNVV VITTSSSPPP PPTLLWWPPP CDAAAAAAA- EEEEEEEEVA GIR------- -HLAAGGAAA AGLLASSSSS AQQSSHAAAA SLSSSSIPPE EDPPPPEETN DSEDDDFDDL PKIAA--RDD DSEEASSSSS KTTIIREESS SLSSSSIPPE EDPPPPEETN DSEDDDFDDL PKIAA--RDD DSEEASSSSS KTTIIREESS AAAVSAASGG RRRRVVVAHH FTALLSRRRF --PPVAAPTT TDAAHHF--Y YYHFFFACCC SEELGPPTEE RRRRVVVAYY FEALLSNNRS NNPPATTSSS SSSSEELLSY YYTLLLACCC SEELGPPTEE RRRRVVVAYY FEALLSNNRS NNPPATTSSS SSSSEELLSY YYTLLLACCC PYYYLLLFFF FFHHFTAAAL LLEGGCCCDV ISSSSGQPAA QLRGGPPPFF -LIIITTGPS PYYYSSSFFF FFHHLTAAAL LLEKKSSSNI VGGGGGQPAA QTRSSKKPTT QIVVVSSPAS PYYYSSSFFF FFHHLTAAAL LLEKKSSSNI VGGGGGQPAA QTRSSKKPTT QIVVVSSPAS PPTGRE-LLR DVGGGLLLAA ADLLAARRVF RVAAALLLPP PMMLLIAAPP PGGEANSLHH LLGESESLLI ATGGGNLLRR RDFFAAKKLF IILLLIIIGG GSSFFVDDPP PDDEVNFLYY LLGESESLLI ATGGGNLLRR RDFFAAKKLF IILLLIIIGG GSSFFVDDPP PDDEVNFLYY RRLLLLPPAQ QAP----DDD AAALDDCCVV SVRRRRPIFT TVIIEEENNN KTGGDRRFTE KKLLLL---E ETPIIIIDDD TTTLRRLLAA SLNNNNPVVT TLGGEEENNN RVGGNRRVKN KKLLLL---E ETPIIIIDDD TTTLRRLLAA SLNNNNPVVT TLGGEEENNN RVGGNRRVKN FYSSAFLSAS GGAAMAAAAA YLREECDDDI VVVGGGEGA- -RERRHELSW RRRRAGGLLL QYSSAFLLGR DDEVRVEEEE LFRRRSGGGL IIIPPPEKGI IHERRMEKEW LNNNAGGFFF QYSSAFLLGR DDEVRVEEEE LFRRRSGGGL IIIPPPEKGI IHERRMEKEW LNNNAGGFFF SSSAAPGGSA ALRMMMLLLF EE--EEEAAD GLLLLGGWRR LSAASSSWAG DDGDDNNNNN EEESSKSSNA AVSIIILNNY SSYYEESKKP GIIIIAAWLL LTLLSSSW-- ---------- EEESSKSSNA AVSIIILNNY SSYYEESKKP GIIIIAAWLL LTLLSSSW-- ---------- SNNNVSSGGS GDSNSSSSNN NNKKSSSSSG AADDGSVVV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -MMMTQQ--M DPPALLLDAA FFPPPPPPPP t5g66770.1 MAYMTTTDLL LIAQVIKQKQ QQQQQHHQQD DHHHINNLN- NTTSLLLSGG AAPDDDDDPP t5g66770.2 MAYMTTTDLL LIAQVIKKKQ QQQQQHHQQD DHHHINNLN- NTTSLLLSGG AAPPPPPPPP PAAAV----- ---------- ---PPPDDGV VG-------- ---------- ---------- PFFQVGGGGG SPPGGFPNNL DHHTTTTTGG GGRRRDFGGG GTTGESSSSD DEIGGGGDVD PFFQVGGGGG SPPGGFPNNL DHHTTTTTGG GGRRRDFGGG GTTGESSSSD DEIGGGGDVD ------DDPP A----GA--- ---------- -------VDA APFAFPCDAA AAVL-----M GDCCDWDDNP DVVIYGPPFF TTYSSRLLLV PDLLNRRIDS SLPTWPSLSI PPLTEEEEEP GDCCDWDDNP DVVIYGPPFF TTYSSRLLLV PDLLNRRIDS SLPTWPSLSI PPLTEEEEEP MRREEEV--- ------LLLM SCAAGAAIII EDDHHAAALQ LDSAAAAALA VVASGGRRVA PKKEPPTDDD DLEPPPAAIY DCAA-RRIII SDDPPNNNET LQIEESSSVS LLDT-ERRVA PKKEPPTDDD DLEPPPAAIY DCAA-RRIII SDDPPNNNET LQIEESSSVS LLDT-ERRVA VHTAAALSSL PP-SPATTDE HHA-YEACAA FFAAAANQAL LEAHGDHHHV VVFLLMQQQQ FYTAAALSSL PPNSPTSSST EEDLNDACAA LLAAAANQAL LEAEKNKKKI IIFIIVQQQQ FYTAAALSSL PPNSPTSSST EEDLNDACAA LLAAAANQAL LEAEKNKKKI IIFIIVQQQQ GLLWPALLII IIQPGGPPPF ---RIIITTG GGGGGGPPSS SPTTTGRRE- LRRGRRADLA GIIWPALLLL LLQTSGPPPT QQQRVVVSSG GGGPPPPPSS SLGGGESSEP LIIGRRRDFA GIIWPALLLL LLQTSGPPPT QQQRVVVSSG GGGPPPPPSS SLGGGESSEP LIIGRRRDFA RVRFFFFSSS RRRGASLLDV VVWWMMLAPA VVASSLHHRR LLLGDDDDQA APP----IDD KLNFFFFDDD IIIPTPIIHL LLSSSSFDPV LLAFFLYYKK LLL--DDDET TPPTTIIVDD KLNFFFFDDD IIIPTPIIHL LLSSSSFDPV LLAFFLYYKK LLL--DDDET TPPTTIIVDD VLDCASSSVP KFFVVEEQAD DDKTGFLLRF FTTTELLFYY SAFDDAAASS ASGAAAGGGN ALRLKSSSLP RVVLLEEYVS SSRVGFAARV VKKKNLLQFY SAFEEPNNLL GRSEEEEEER ALRLKSSSLP RVVLLEEYVS SSRVGFAARV VKKKNLLQFY SAFEEPNNLL GRSEEEEEER NAAAEE--YL QEICDIIGGG EAAAAA-RRR REHHPLSSRW WWWDTGGGLL SSAVVLLNNA RVVVEERRLF GRISGLLPPP ETTTTGIHHH REMMEKEEQW WWWVEGGGFF EESVVLLYYA RVVVEERRLF GRISGLLPPP ETTTTGIHHH REMMEKEEQW WWWVEGGGFF EESVVLLYYA ALRRQRMMLV GLFSGG---- VVVVEADDGG CLLHGRLFSA WEAGGDGGGG NNNNNNNNSN AVSSQKIILL WNYNYYLYYY VVVVEKPPGG FIINDLLLTL WR-------- ---------- AVSSQKIILL WNYNYYLYYY VVVVEKPPGG FIINDLLLTL WR-------- ---------- NSSNSSSGGG GSDSNNNSGS SSSSNSSGGA ARDDDSSSL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --DDTFPFFF Q-----WWPP PMPAAASSSS t5g66770.1 MMMMAYYYMC TDNLMQVVQK QQQQQQQQQQ QDQQIFGIII NPLSPPWWPP P-TSLLGFFF t5g66770.2 MMMMAYYYMC TDNLMQVVKK QQQQQQQQQQ QDQQIFGIII NPLSPPWWPP P-TSLLGFFF SSSSGLFFFL PPPAAV---- ---------- AAAADYYYY- ---------- ---------- FFFFGLAAAF PPPFQVTTDD FPPPNNDDHH AAAATFFFFL LDDGGGGGGG GGEEWWLLLL FFFFGLAAAF PPPFQVTTDD FPPPNNDDHH AAAATFFFFL LDDGGGGGGG GGEEWWLLLL ---------- ---------- YYDDPPPPAA A-----GGGA AD-------- ---------- IISSVVVAPP DCDDDTWWWW HHDDNPPPDD DYYIYYGGGP PDPDDDDYYS RLLSSSVQQP IISSVVVAPP DCDDDTWWWW HHDDNPPPDD DYYIYYGGGP PDPDDDDYYS RLLSSSVQQP -------VVV VDAAAAAAAL LEFAAAPPPP PAAAAVVVLL L--RRRREEE EEEEVGI--- PPSDDDLIII IDTTTSSSSP PPPTTLPPPP PPPPPLLLTT THHTTTTDPP EEEETDSDDD PPSDDDLIII IDTTTSSSSP PPPTTLPPPP PPPPPLLLTT THHTTTTDPP EEEETDSDDD -----VLMSC AGAIAAGHHA ASSQSHHHAA AAAAASSIGR VAVTTALLLS SRRRRRLFPP DDDPPLAYDC A-RIDDSPPA ASSTIRRRES EEDDPTT-ER VAFTEALLLS SNNRRRLSPP DDDPPLAYDC A-RIDDSPPA ASSTIRRRES EEDDPTT-ER VAFTEALLLS SNNRRRLSPP P-SPPVAAPP TAAEAAAAL- YFEEECCPYL KFAHFFFTTI IAAAGCDVVI FFFLMLLWWP PNSPPATTSS SSSTDDDDIL YLDDDCCPYS KFAHLLLTTI IAAAKSNIIV FFFIVIIWWP PNSPPATTSS SSSTDDDDIL YLDDDCCPYS KFAHLLLTTI IAAAKSNIIV FFFIVIIWWP PLQAALPGGG PFF-LRIIPP PTGRDDEE-- -LDVGGLLLR DAASSVFFRG GVANNLEVVV PLQAATTSGG PTTQIRVIAP LGESPPEEPP SLATGGNNNR DAAVVLFFIP PIL--ILLLL PLQAATTSGG PTTQIRVIAP LGESPPEEPP SLATGGNNNR DAAVVLFFIP PIL--ILLLL PWMLLLQIAP AASQHHHRRL DAQQAPP-II IDLLLCVVAS SPPTIIIEEE EQQEEADDHK GSSFFFRVDP VAFQYYYKKL --EETPPTVV VDLLLLAAKS SPPTGGGEEE EYYEEVSSLR GSSFFFRVDP VAFQYYYKKL --EETPPTVV VDLLLLAAKS SPPTGGGEEE EYYEEVSSLR GFLRFEEAAF YYYAVVSSLA AAASAASAAN AMMAYYLRRR EEICIIVEGA AAA---REHE GFARVNNAAQ FFYAVVSSLP PNNLGGREER VRRVLLFRRR RRISLLIEKT TTGIIIHEME GFARVNNAAQ FFYAVVSSLP PNNLGGREER VRRVLLFRRR RRISLLIEKT TTGIIIHEME EEPPLLSRRD LAAGSAAAVP LGAAARQRML VGGGSGGG-- --HSSEEEEA GGCCLGWPLF EEEEKKERRV MAAGESSSVK LSAAASQKIL LWWWNYNNLY YYSIIEEESK GGFFLAWPLL EEEEKKERRV MAAGESSSVK LSAAASQKIL LWWWNYNNLY YYSIIEEESK GGFFLAWPLL AAAWAAGGDN NNNNNSSVSG SSGNSSSSSG KSGGRSVLL SSSW------ ---------- ---------- --------- SSSW------ ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --MMMMMDDD TTFFPFQ--- WPPPPMMPAS t5g66770.1 YTDDNMMIQQ VVIIIQQQKK QQEQQQQQQQ DDHHHHHQQQ IIFFGINPLP WPPPP--TLF t5g66770.2 YTDDNMMIQQ VVIIIKKKKK QQEQQQQQQQ DDHHHHHQQQ IIFFGINPLP WPPPP--TLF GGAAFFLLPP PPAAAAV--- ---------- -------AAP DDDGVY---- ---------- GGGGAAFFPD PPFFFFVGGD DNNDDGFPNN NNNDHHHAAT TTTGGFLSDF FFFFGGGGGE GGGGAAFFPP PPFFFFVGGD DNNDDGFPNN NNNDHHHAAT TTTGGFLSDF FFFFGGGGGE ---------- ---------- ---------- ---YYYPA-D ---------- ---------- EFFFDEWWWL IIISSGGGVA DDGPPDDDDD DTWHHHPDYD DTPPSSLLVQ QQPDDDDDLR EFFFDEWWWL IIISSGGGVA DDGPPDDDDD DTWHHHPDYD DTPPSSLLVQ QQPDDDDDLR VAAFAAFFFP PPAPDDDAAA AMREEEVVAG --------LL LHLSAGGGII QQLAAADHHL ISSPPLWWWP PPSPLLLSIP SPTEDETTND DDDFDLEELL LKIDA---II TTLLLLQRRV ISSPPLWWWP PPSPLLLSIP SPTEDETTND DDDFDLEELL LKIDA---II TTLLLLQRRV AAAAAAGIVA AHFTTARRLL LFPPPPPPVV VTTTDAAEEE EAAFFFAPPP YYKFNQILAA SSEEPP--VA AYFTEANNLL LSPPPPPPAA ASSSSSSTTT TDDLLLAPPP YYKLNQILAA SSEEPP--VA AYFTEANNLL LSPPPPPPAA ASSSSSSTTT TDDLLLAPPP YYKLNQILAA AHGHVVIDDF FSMMGGQWQQ LLAALLRPPG PPF------- LRRITTGIII IPPPSPT--- AEKKIIVDDF FGVVGGQWQQ LLAATTRTTS KPTQQQQQQQ IRRVSSGIII IAPPSLGPPP AEKKIIVDDF FGVVGGQWQQ LLAATTRTTS KPTQQQQQQQ IRRVSSGIII IAPPSLGPPP --DDVVGLLL AALRRSRRRF FFRVAAANSS LLDDVVVRRW MLLIIIGGEA FFFSSVVVLL SSAATTGNNL RRFKKVDDNF FFIILLT-PP IIHHLLLNNS SFFVVVDDEV VVVFFMMMLL SSAATTGNNL RRFKKVDDNF FFIILLT-PP IIHHLLLNNS SFFVVVDDEV VVVFFMMMLL LLRLLGGDDP PAADDAIIIL LLVVVFTTVV IIEEQAADDK TGGFFLLRFE EEELYYYSSS LLKLL----- ---DDTVVVL LLAALVTTLL GGEEYVVSSR VGGFFAARVN NNNLFFYSSS LLKLL----- ---DDTVVVL LLAALVTTLL GGEEYVVSSR VGGFFAARVN NNNLFFYSSS AVVFDDSLDA AAASSSGGAA AMMAEYYYLR EICCDDDIIV GA-RRERHEE PSSSRDLRAG AVVFEESLEP PNNLRRSSEV VRRVELLLFR RISSGGGLLI PGIRRERMEE EEEEQVMNAG AVVFEESLEP PNNLRRSSEV VRRVELLLFR RISSGGGLLI PGIRRERMEE EEEEQVMNAG SSSSAAVVPN NRARMMMLVL FFSSSG---S VVVEEDTLGH HGPLLLFSSA WEEAGDDDDG EEEESSVVKY YSAKIIILLN YYNNNNLLYI VVVESPSLAN NDPLLLLTTL WRR------- EEEESSVVKY YSAKIIILLN YYNNNNLLYI VVVESPSLAN NDPLLLLTTL WRR------- GGDDDNNNSS NNVVSSGGNS SGSSSNKKSS SAGSCLLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --MDDFPQQQ ----WWWPPM PASSLLAAGG t5g66770.1 MSSNLLMAIA VVIKKQQQQQ QEEQQQQQHQ DDHQQFGNNN NNNPWWWPP- TLGFLLGGSS t5g66770.2 MSSNLLMAIA VVIKKQQQQQ QEEQQQQQHQ DDHQQFGNNN NNNPWWWPP- TLGFLLGGSS FPPPAAAVV- ---------- -------PPP DDDDDGYY-- ---------- ---------- APPDQQQVVT GGGDSDDPGG PPNNLHHTTT TTTTTGFFRR LSDDFGGTGG EEFDMMMELL APPPQQQVVT GGGDSDDPGG PPNNLHHTTT TTTTTGFFRR LSDDFGGTGG EEFDMMMELL ---------- ---------- ---DPPPPA- -----AA--- ---------- DDAAAAAAPE LIISSGDVAD GPPPDCCTTT TWWDNPPPDY VIIIYPPTYS SLSVPSSLLN DDTSSSSSLP LIISSGDVAD GPPPDCCTTT TWWDNPPPDY VIIIYPPTYS SLSVPSSLLN DDTSSSSSLP EEEEAPPPPC CAPAAAAV-- --EEEEEEEV AAGIR----- ---------- ----LLLLLM PPPPLPPPPS SSPSIPPLHE EEDDDPPEET NNDSEDDDDD DDDDDDDDDF DLPPLLAAIY PPPPLPPPPS SSPSIPPLHE EEDDDPPEET NNDSEDDDDD DDDDDDDDDF DLPPLLAAIY MGAAAAGHAL AQQLAAADSH LLAAVSAASG RRVAAVHFFF TAAALLRRRL LLF--SVVAP Y-RRDDSPNE ATTLLLLQIR VVSELGDPT- RRVAAFYFFF EAAALLNNRL LLSNNSAATS Y-RRDDSPNE ATTLLLLQIR VVSELGDPT- RRVAAFYFFF EAAALLNNRL LLSNNSAATS PEHLL--YHH FFYYEAPPPY LNQQQQAILE EHHCCDHHVI DDFLQLPPAA LALRPPPPLR STEIILSYKT LLNNDAPPPY SNQQQQAILE EEESSNKHIV DDFIQIPPAA LATRTTTPIR STEIILSYKT LLNNDAPPPY SNQQQQAILE EEESSNKHIV DDFIQIPPAA LATRTTTPIR GGGIIGPPPP PPPSPTTTGG DDDE-DRLLL LAARVRGGVV VAAANSDVRR RRRRWLIAPP GGGIIPAAAP PPPSLGGGEE PPPESARLLF FAAKLNPPII ILLL-PHLNN NNNNSFVDPP GGGIIPAAAP PPPSLGGGEE PPPESARLLF FAAKLNPPII ILLL-PHLNN NNNNSFVDPP AVNSSVLLLL LRLLLGDDAA DDDQAAP--- -VVDVASSVR PPIIFFTIIE EEADNKTGLL VLNFFMLLLL LKLLL----- DDDETTPTTT IAARAKSSLN PPVVVVTGGE EEVSNRVGAA VLNFFMLLLL LKLLL----- DDDETTPTTT IAARAKSSLN PPVVVVTGGE EEVSNRVGAA RTLYYYSSSD SAAASAGGGN NAME-AYYLL RCDIIGAA-R RRREEEERHH HPLSSSSRDR RKLFYYSSSE SPNNLGDSER RVRERELLFF RSGLLKTTIH HHREEEERMM MEKEEEERVL RKLFYYSSSE SPNNLGDSER RVRERELLFF RSGLLKTTIH HHREEEERMM MEKEEEERVL LLLRAAASAA VVGSLLRRRR RMVGGGLFEG SVEEAALLTT TLGHHGGPPP LSSSASSAAE MMMNAAAESS VVSNVVSSSK KILWWWNYSN IVEEKKIISS SLANNDDPPP LTTTLSSSSR MMMNAAAESS VVSNVVSSSK KILWWWNYSN IVEEKKIISS SLANNDDPPP LTTTLSSSSR GGGGGGDDDN SNNSSSSGGD SNNSGGGGGK GRSSSSVVC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ------MMMD TPQ----MMM t5g66770.1 AAYYMMMCTD DSGGNLLLMM AIQQVIKKKQ KKQEQQQQQQ QQHQDDHHHQ IGNSLLP--- t5g66770.2 AAYYMMMCTD DSGGNLLLMM AIQQVIKKKK KKQEQQQQQQ QQHQDDHHHQ IGNSLLP--- AAASSSSSGD AFLLVV---- ---------- -------PDG VVGGY----- ---------- SLLGGFFFGS GAFFVVDDSN DDDPFFPPPL LLLDHHHTTG GGGGFLLGGG GGGGGGEEEW SLLGGFFFGS GAFFVVDDSN DDDPFFPPPL LLLDHHHTTG GGGGFLLGGG GGGGGGEEEW ---------- ---------- ------DDDP PA-------- -GAD------ ---------- MELLSSSGGG GGSAADGPPD CCDTWWDDDP PDYYYVVIII IGPDPFDDTT YYLSSDLLRV MELLSSSGGG GGSAADGPPD CCDTWWDDDP PDYYYVVIII IGPDPFDDTT YYLSSDLLRV VVVDAAAEEE FAAFFPPCAP DDAAALLL-A EEEEEEAGGG GII------- --VHHHLLSC IIIDSSSPPP PLLWWPPSSP LLPPPTTTES EEDDDPNDDD DSSDDDFLEE EPLKKKAIDC IIIDSSSPPP PLLWWPPSSP LLPPPTTTES EEDDDPNDDD DSSDDDFLEE EPLKKKAIDC AAAAAEEEEA GDDLASAAAS HHAAALLLAA AVSAAASGII RVVAVHFTAA LSL-PVPTAE ARRRRSSSSD SDDEASKKLI RREESVVVSS ELGDDPT--- RVVAFYFEAA LSLNPASSST ARRRRSSSSD SDDEASKKLI RREESVVVSS ELGDDPT--- RVVAFYFEAA LSLNPASSST HAAFL-HHHA CYYKKFFFTA ANNQQAAAIL AAAFHHGGDH HHVDFFSSSS MMQGGLLAAL EDDLISKTTA CYYKKFFFTA ANNQQAAAIL AAATEEKKNK KKIDFFGGGG VVQGGIIAAL EDDLISKTTA CYYKKFFFTA ANNQQAAAIL AAATEEKKNK KKIDFFGGGG VVQGGIIAAL IQLLLALPGG GPP-IITTGG IGPSPTTTGR RE--LRDDLA AADDLLAASV RVSRGNSLEP LQLLLATTSS GPPQVVSSGG IPPSLGGGES SEPSLIAALR RRDDFFAAVL DLDIP-PILG LQLLLATTSS GPPQVVSSGG IPPSLGGGES SEPSLIAALR RRDDFFAAVL DLDIP-PILG MMQQIIAAEA VAFNSSLLQH RLGDDPAAPP --DAVLDCVA SSVRKITTTT VIEEEAHNKT SSRRVVDDEV LAVNFFLLQY KL------PP TIDTALRLAK SSLNRVTTTT LGEEEVLNRV SSRRVVDDEV LAVNFFLLQY KL------PP TIDTALRLAK SSLNRVTTTT LGEEEVLNRV GFLLDDFAFF YAFFFAAASS GGGGGAAE-Y YYQQRRRRRE CIIICGAAAA -RRRRERRHE GFAANNVAQQ YAFFFPPNLL DDDDSVVERL LLGGRRRRRR SLLLGPTTGG IHHRRERRME GFAANNVAQQ YAFFFPPNLL DDDDSVVERL LLGGRRRRRR SLLLGPTTGG IHHRRERRME EPPPLRWWRR RTTRAAAALS SPLNAALLRQ AAMLVLLLFS GGGEG--VEA AGGCLLGGHG EEEEKQWWRR LEENAAAAFE EKLYAAVVSQ AAILLNNNYN YYYSNYYVEK KGGFILAAND EEEEKQWWRR LEENAAAAFE EKLYAAVVSQ AAILLNNNYN YYYSNYYVEK KGGFILAAND PLFSSAAWAA GGDNNNNNNV VDDNGSSNNS SGGVVCLLL PLLTTSSW-- ---------- ---------- --------- PLLTTSSW-- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------M MPPPFFQ--- -----PDAAS t5g66770.1 MMAAYYMCDG NMAAAIIAQQ QVIIIQKQQQ QQHHHHDDDH HGGGIINPPP SLLNPPNSLG t5g66770.2 MMAAYYMCDG NMAAAIIAQQ QVIIIKKQQQ QQHHHHDDDH HGGGIINPPP SLLNPPNSLG SSSAAAAGFF LLPPPPAV-- ---------- ----ADVVVG Y--------- ---------- GFFGGGGSAA FFPPPPFVDS SDGGGGPPND HHHHATGGGG FRLLLLSDDD GEEESMELII GFFGGGGSAA FFPPPPFVDS SDGGGGPPND HHHHATGGGG FRLLLLSDDD GEEESMELII ---------- --------DD DDP------- ---------- ---VDAPFFF AAAACCCPDD SSSSGADGGD DDCDDDWWDD DDNVYYYYPP DTTRSSSSSL NVVIDSLPPP PPTLSSSPLL SSSSGADGGD DDCDDDWWDD DDNVYYYYPP DTTRSSSSSL NVVIDSLPPP PPTLSSSPLL AVL--AAARE EVVAGIIIIR R--------- LVVHHLLLMM CCAGGIEGAL AAASSSSQQL PLTHESSSKE ETTNDSSSSE EDDDDLLEEP LLLKKAIIYY CCA--ISSNE AAASSSSTTL PLTHESSSKE ETTNDSSSSE EDDDDLLEEP LLLKKAIIYY CCA--ISSNE AAASSSSTTL AADSHLLAAA VSASSGIGRF FFFTTLSSRR RRLLLFPPSP VVVAPTDEA- --HHFYYYEY LLQIRVVSSE LGDTT--ERF FFFEELSSNN NRLLLSPPSP AAATSSSTDL LLKKLNNNDY LLQIRVVSSE LGDTT--ERF FFFEELSSNN NRLLLSPPSP AAATSSSTDL LLKKLNNNDY LKFFFFFANN QAIILEEEAF FGGGHVVVVV DFSLMQLWPP PALIIQQQLR RRGP--TGGP SKFFLLLANN QAIILEEEAT TKKKKIIIII DFGIVQIWPP PALLLQQQLR RRSPQQSGPA SKFFLLLANN QAIILEEEAT TKKKKIIIII DFGIVQIWPP PALLLQQQLR RRSPQQSGPA SSTTRRRRDD E-LRRDVLLL LLAADAARSS SSRVFSFFVV NNNEVPPPLQ IAVAFNNSQQ SSGGSSSSPP ESLIIATNNN NNRRDAAKVV VVDLFDFFII ---LLGGGFR VVLAVNNFQQ SSGGSSSSPP ESLIIATNNN NNRRDAAKVV VVDLFDFFII ---LLGGGFR VVLAVNNFQQ HHHGGDP--A VDDVAAAVRK KITVIIIEEQ QEEAAHNNNK KGLDTTLLFF YSLDAAAAGG YYY----TIT ARRAKKKLNR RVTLGGGEEY YEEVVLNNNR RGANKKLLQQ FSLEPNNGDS YYY----TIT ARRAKKKLNR RVTLGGGEEY YEEVVLNNNR RGANKKLLQQ FSLEPNNGDS AAGGMAEAEI IICIICCGAA AA-----RRR RELLSSWRDD LLTTTRRAGL AVVPPPLGSN EEEERVEERI IISLLGGKTG GGIIIIIRRR REKKEEWRVV MMEEENNAGF SVVKKKLSNY EEEERVEERI IISLLGGKTG GGIIIIIRRR REKKEEWRVV MMEEENNAGF SVVKKKLSNY AALRAAMLGG LLG-SSVEEA DGLLLTLGGR RPLLFFSSAA SSAWEAAAAA AGGGGGGGGG AAVSAAILWW NNNLIIVSSK PGIIISLAAL LPLLLLTTLL SSSWR----- ---------- AAVSAAILWW NNNLIIVSSK PGIIISLAAL LPLLLLTTLL SSSWR----- ---------- GGDNNSSNNN GSSSGSDSSG SSGSSSSSGG RDDDSVCCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --MDDTTFQ- -----PMDPA AAASSSGGDA t5g66770.1 AAYMTNLLIA AAAQVQQQQQ QEEQQQQQHH HHHQQIIFNP LSNNPP-NTS LLLFFFGGSG t5g66770.2 AAYMTNLLIA AAAQVKKKQQ QEEQQQQQHH HHHQQIIFNP LSNNPP-NTS LLLFFFGGSG GPPPPAAA-- ---------- ---------A APPPPDDG-- ---------- ---------- SDPPPFQQGG GGGGDDSNFF FPNNNDDHHA ATTTTTTGRR RLSFFFGGGG GFFFEESWMM SPPPPFQQGG GGGGDDSNFF FPNNNDDHHA ATTTTTTGRR RLSFFFGGGG GFFFEESWMM ---------- DDDDPPA--- ---------- ---------- VVDAAAAAPE AAAAAFPPPP ETLGGVVGPW DDDDNPDYIY YPDPPLLSVQ SDDLNNNRVV IIDTTSSSLP PPPTTWPPPP ETLGGVVGPW DDDDNPDYIY YPDPPLLSVQ SDDLNNNRVV IIDTTSSSLP PPPTTWPPPP PPPPPAAAAD AAAAL-AMRE EEVIIRRR-- ---VVVHLLM MSCCAAIEEE DHAAQLLAAD PPPPPSSSSL SSIPTHSPTE EPTSSEEEDD LPPLLLKAAY YDCCAAISSS DPNNTLLLLQ PPPPPSSSSL SSIPTHSPTE EPTSSEEEDD LPPLLLKAAY YDCCAAISSS DPNNTLLLLQ SAAALLAVSA AASSGIGRRR VVVHTTTAAS SRRFP-PVAA AAPPPPPPDA EHHAFL--FF IESSVVELGP PPTT--ERRR VVFYEEEAAS SNRSPNPATT TTSSSSSSSS TEEDLILSLL IESSVVELGP PPTT--ERRR VVFYEEEAAS SNRSPNPATT TTSSSSSSSS TEEDLILSLL YECCPYYLLL FAHFFQAILA FGHHHIDDSQ GWWALAALLR GGPF-RIIIT GGGGGIGPPP NDCCPYYSSS FAHLLQAILA TKKHHVDDGQ GWWALAATTR SGPTQRVVVS GGGGGIPAPL NDCCPYYSSS FAHLLQAILA TKKHHVDDGQ GWWALAATTR SGPTQRVVVS GGGGGIPAPL TTG---LLRV RLRVVFSSSG GGGVAANLLD DEVVPWLLQQ IAGEAAFNSS VLQLLLLGDP GGEPSSLLIT RFKLLFDDDP PPPILT-IIH HLLLGSFFRR VDDEAAVNFF MLQLLLL--- GGEPSSLLIT RFKLLFDDDP PPPILT-IIH HLLLGSFFRR VDDEAAVNFF MLQLLLL--- AAD---DAAV VDDVVVVAAS RRIIIFTVVV IEEEAHHHKK TTTFFLLLDR RTTAAFAAVS --DTTIDTTA ARRAAAAKKS NNVVVVTLLL GEEEVLLLRR VVVFFAAANR RKKAAQAAVS --DTTIDTTA ARRAAAAKKS NNVVVVTLLL GEEEVLLLRR VVVFFAAANR RKKAAQAAVS SSLASSSSGG GAGGNAEAYL LQRRIICVVC CCEEEGAA-- REEREPPSRR WRDTRRGSSA SSLNLRRRSS SEEERVEELF FGRRIISIIG GGEEEKGGII REEREEEEQQ WRVENNGEES SSLNLRRRSS SEEERVEELF FGRRIISIIG GGEEEKGGII REEREEEEQQ WRVENNGEES ALGNNALRRQ AMMMMLVGLF SEEEG---SS EGGLLLGGHG PPPLLFFWEA AAAGDGGGGN SLSYYAVSSQ AIIIILLWNY NSSSNLYYII EGGIIIAAND PPPLLLLWR- ---------- SLSYYAVSSQ AIIIILLWNY NSSSNLYYII EGGIIIAAND PPPLLLLWR- ---------- SSSSNVGGGG SSSSGSSDDS NNSSSSKSSS SSAARDSVC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -----MDTFP QQQ-----MD DAAASSSGLG t5g66770.1 AAYYYMMCCC CTGGLLLMAQ QIKQKKQQQQ QQQQQHQIFG NNNLLSLP-N NSSLGGFGLS t5g66770.2 AAYYYMMCCC CTGGLLLMAQ QIKKKKQQQQ QQQQQHQIFG NNNLLSLP-N NSSLGGFGLS GFLPPPAVVV ---------- ---------A AADDGG---- ---------- ---------- SAFDDPFVVV TGGGGDNGFP FFPPNNDHHA AATTGGLSSD DDFGGGGGEE EFEEEESSSD SAFPPPFVVV TGGGGDNGFP FFPPNNDHHA AATTGGLSSD DDFGGGGGEE EFEEEESSSD ---------- ---------- --------P- --GGAD---- ---------- -DDAAAFFPP DDEEWTLLLI ISSSDDGGGG PPDDTTTTNV IYGGPDPPFD DTPRRLQQSD VDDSSSWWPP DDEEWTLLLI ISSSDDGGGG PPDDTTTTNV IYGGPDPPFD DTPRRLQQSD VDDSSSWWPP AAAAAAVVVL ----AAMMRR EEEEEGIR-- --------LL HHHLLMMSAA EAALSAAHAA SSSSIPLLLT HHEESSPPTK EPPPEDSEDF DDDLLEPPLL KKKAAYYDRR SDNESKLRES SSSSIPLLLT HHEESSPPTK EPPPEDSEDF DDDLLEPPLL KKKAAYYDRR SDNESKLRES AAASASSIIG GVVVFTTALS LLLFSPVVVA PPPPPTTTDH FLL---YFYA LKFFAAHHFF SSSGPTT--E EVFFFTEALS LLLSSPAAAT SSSSSSSSSE LIILLSYLNA SKFFAAHHLL SSSGPTT--E EVFFFTEALS LLLSSPAAAT SSSSSSSSSE LIILLSYLNA SKFFAAHHLL TNAIIEAAFF FHHGCHHHVI DSSLLMQQGP ALIILALPPG PPPLIGGGPP SSSGDDDEER TNAIIEAATT TEEKSHHHIV DGGIIVQQGP ALLLLATTTS KPPIVGGPAP SSSEPPPEEI TNAIIEAATT TEEKSHHHIV DGGIIVQQGP ALLLLATTTS KPPIVGGPAP SSSEPPPEEI RRDDVVVRRA RVRFFFFRGG GGVVAADEVR WMMAAPEEAA ANNNSVLLLL HHLDPADQP- IIAATTTRRR KLNFFFFIPP PPIILTHLLN SSSDDPEEVV ANNNFMLLLL YYL---DEPI IIAATTTRRR KLNFFFFIPP PPIILTHLLN SSSDDPEEVV ANNNFMLLLL YYL---DEPI -IDVVLLLLL DVRITTTVVI QQEEADDDHN KTTFLLRFTA AFFYYYFFSS DDAAAAASGG IVDAALLLLL RANVTTTLLG YYEEVSSSLN RVVFAARVKA AQQFFFFFSS EEPPPNGRDS IVDAALLLLL RANVTTTLLG YYEEVSSSLN RVVFAARVKA AQQFFFFFSS EEPPPNGRDS GGAGNNAAAM M---AYLQRR IVCEGGA-RR EEEEEPPPLL LSSSWRDRLL LTRAGLSAGS SSEERRVVVR RRRRELFGRR IIGEKKTIHH EEEEEEEEKK KEEEWRVLMM MENAGFESSN SSEERRVVVR RRRRELFGRR IIGEKKTIHH EEEEEEEEKK KEEEWRVLMM MENAGFESSN SSNAARRQAR MMMVVGFFGG ------HHVV AADDGLTLLG GGWHHRPLLF SAAAAWEAGG NNYAASSQAK IIILLWYYYY LLLLYYSSVV KKPPGISLLA AAWNNLPLLL TLSSSWR--- NNYAASSQAK IIILLWYYYY LLLLYYSSVV KKPPGISLLA AAWNNLPLLL TLSSSWR--- GNNNSNGGGS SSSDDDDSNN GSSSKSSDDG GGGSSSCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----MDF--- ---WAAAALL LDDDAGFPPP t5g66770.1 YYTMIAAQVV VVKQEEEQQQ QQQQQHHHHH HHQQHQIPPL NPPWSSSLLL LSSSGSAPDP t5g66770.2 YYTMIAAQVV VVKQEEEQQQ QQQQQHHHHH HHQQHQIPPL NPPWSSSLLL LSSSGSAPPP AVV------- ---------- ---------A PDGVV----- ---------- ---------- QVVTGGGGGG SNNDDPPFFF PDDDHHHHHA TTGGGLLLSS FGGGGGGGEF FEESEETTTL QVVTGGGGGG SNNDDPPFFF PDDDHHHHHA TTGGGLLLSS FGGGGGGGEF FEESEETTTL ---------- ---------- ------YDDP A-DD------ ---------- ----VDDDAA IIIGGGGGGD DVADPPPDDC CDDWWWHDDP DIDDPYPPSR LSSVVSDLLN NNVVIDDDTS IIIGGGGGGD DVADPPPDDC CDDWWWHDDP DIDDPYPPSR LSSVVSDLLN NNVVIDDDTS LLPPEAAPPP PCAPDDAAAV VVL-AMMMEE EAGI------ ---------L VVHLLSCAGI PPLLPTLPPP PSSPLLSPPL LLTHSPPPED PNDSDDDDDF FDLEPPPPPL LLKAADCA-I PPLLPTLPPP PSSPLLSPPL LLTHSPPPED PNDSDDDDDF FDLEPPPPPL LLKAADCA-I EAGDDDHAAA AAADSHAALA AASSGGIIIG GRVVVVVVHF FTTTLSSRRL LLLFPSSPPP SDSDDDPNAK LLLQIRESVS EEGT-----E ERVVFFFFYF FTEELSSRRL LLLSPSSSSS SDSDDDPNAK LLLQIRESVS EEGT-----E ERVVFFFFYF FTEELSSRRL LLLSPSSSSS PTDAAAAEEH AFLL-YHHHH AACCKKFFTL EEEAHHDVVH VDFLLMMQQQ GLAALLLLAL SSSSSSSTTE DLIISYKKKT AACCKKLLTL EEEAEENIIH IDFIIVVQQQ GIAALLLLAT SSSSSSSTTE DLIISYKKKT AACCKKLLTL EEEAEENIIH IDFIIVVQQQ GIAALLLLAT RGGGGPRITT IIIIGPSPGR RDEEE-LRRD DVRALLLARV VRVFFSGGAA ANNNSSDDEV RSGGGKRVSS IIIIPASLES SPEEEPLIIA ATRRFFFAKL LDLFFDPPLT T---PPHHLL RSGGGKRVSS IIIIPASLES SPEEEPLIIA ATRRFFFAKL LDLFFDPPLT T---PPHHLL VVRPPWMMLL LQIASSVLLR RLGGDDPPP- --IIDAVLDV VAAAASVVVR PKTVVIIEEE LLNGGSSSFF FRVAFFMLLK KL------PT TTVVDTALRA AKKKKSLLLN PRTLLGGEEE LLNGGSSSFF FRVAFFMLLK KL------PT TTVVDTALRA AKKKKSLLLN PRTLLGGEEE QEAAADDDNK TFFFFLFTAL FFFYSAAFDD LASSASGGGG GNAAMMYYLD IVVGGAAAAR YEVVVSSSNR VFFFFAVKAL QQQFSAAFEE LPLLGRDDDS SRVVRRLLFG LIIKKTGGGR YEVVVSSSNR VFFFFAVKAL QQQFSAAFEE LPLLGRDDDS SRVVRRLLFG LIIKKTGGGR EPDRRTRRAL AAVVPPALRR QQQQLLVVFF FGG--HSEEA DLLLLLLLLG GGGRLSSSAW EEVLLENNAF SSVVKKAVSS QQQQLLLLYY YYNLLSIEEK PIIIILLLLA ADDLLTTSSW EEVLLENNAF SSVVKKAVSS QQQQLLLLYY YYNLLSIEEK PIIIILLLLA ADDLLTTSSW AAAGDGNNNS SNNNSSGGGS SSSNSGSGGS ARRDDGSLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------M MMDDTFQ--W MPAASSSSPP t5g66770.1 AAYYMMMTTS SGNMAAAIAQ VVVKKQQQQQ QQHHQQQQQH HHQQIFNSPW -TSLGFFFPP t5g66770.2 AAYYMMMTTS SGNMAAAIAQ VVVKKKKQQQ QQHHQQQQQH HHQQIFNSPW -TSLGFFFPP PPAAAVV--- ---------- ---------- --ADVGGYY- ---------- ---------- PPFQQVVTTG GGGSNDDPPG GFPFFFNNLD HHATGGGFFR SSFGGGGEEE FFEWMMMEEL PPFQQVVTTG GGGSNDDPPG GFPFFFNNLD HHATGGGFFR SSFGGGGEEE FFEWMMMEEL ---------- --------YP AA----DD-- ---------- ---VDPPEAA AAFFFPPPDD ISGGGGDDDD DDSADPCWHP DDYVIIDDFF TTYPRRLVVV VSVIDLLPTT LLWWWPPPLL ISGGGGDDDD DDSADPCWHP DDYVIIDDFF TTYPRRLVVV VSVIDLLPTT LLWWWPPPLL AAAVLLAARR EEAGIRRR-- ---------- -LVLMMAAGA IEEEAGGGGH ALASHALLLA SIPLTTSSKK DENDSEEEDD DDDFFFDDLL ELLAYYAA-R ISSSDSSSSP NELIREVVVS SIPLTTSSKK DENDSEEEDD DDDFFFDDLL ELLAYYAA-R ISSSDSSSSP NELIREVVVS AVAAAGIIGV HTTTTLSRRL LLFSVVVAPP TTTDDDHHAA FFLL---YYY HHFECCCCYY SLDDP---EF YTTTELSNRL LLSSAAATSS SSSSSSEEDD LLIILLLYYY KTLDCCCCYY SLDDP---EF YTTTELSNRL LLSSAAATSS SSSSSSEEDD LLIILLLYYY KTLDCCCCYY LKKKKAHFTI EHDDHHHHVH VDFSLQGGLL WPAAALLALL PPGGPPPP-- RGGGIIPPSP SKKKKAHLTI EENNKKKKIH IDFGIQGGII WPAAALLATT TTSGKPPPQQ RGGGIIAPSL SKKKKAHLTI EENNKKKKIH IDFGIQGGII WPAAALLATT TTSGKPPPQQ RGGGIIAPSL GGE---LRLD ARRSVFSRVA NNNLLDEEEV VVRRWMLLQQ APEVVFNNSV VLLLQLRLLL EEEPPPLIND AKKVLFDIIL ---IIHLLLL LLNNSSFFRR DPELLVNNFM MLLLQLKLLL EEEPPPLIND AKKVLFDIIL ---IIHLLLL LLNNSSFFRR DPELLVNNFM MLLLQLKLLL GADDQ--IAV LDVVSRIIII FTTVEEAADD DNKTGFFLDF FTTELLLYYF FFDDDDSLLD --DDETIVTA LRAASNVVVV VTTLEEVVSS SNRVGFFANV VKKNLLLFFF FFEEEESLLE --DDETIVTA LRAASNVVVV VTTLEEVVSS SNRVGFFANV VKKNLLLFFF FFEEEESLLE AASSAAGGGN AAMMEAYLLL QQQEEECDIV GEAAAAA--R RRHHEEPPLW WDDRRRRAGA PNLLGGSSSR VVRREELFFF GGGRRRSGLI PETTTTTIIH RRMMEEEEKW WVVLLLNAGS PNLLGGSSSR VVRREELFFF GGGRRRSGLI PETTTTTIIH RRMMEEEEKW WVVLLLNAGS AVPPPLLGSN NRRQRMMFFS GGEG--HSVE EAGCLTGGRP LFFSAAAAWA AGGGDGGGDD SVKKKLLSNY YSSQKIIYYN YYSNLYSIVS SKGFISAALP LLLTLLLSW- ---------- SVKKKLLSNY YSSQKIIYYN YYSNLYSIVS SKGFISAALP LLLTLLLSW- ---------- NNSSSNNSGG SSSDDSSNNS SNNNGGGGKK SGGSSCCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---MDDDTFP PFFF------ ---WMMPASS LDAAGGFFLL t5g66770.1 MAYCCCSSNL AAIIAVQKKQ EQDHQQQIFG GIIILLSNNP PPPW--TSFF LSGGSSAAFF t5g66770.2 MAYCCCSSNL AAIIAVKKKQ EQDHQQQIFG GIIILLSNNP PPPW--TSFF LSGGSSAAFF PPPPAAAVVV ---------- -----AAPVG Y--------- ---------- ---------- PPPPFQQVVV TTGGGDDDDP GFNLDAATGG FLSDDFTTGG GGGGGFEEDD DEEEEMEEES PPPPFQQVVV TTGGGDDDDP GFNLDAATGG FLSDDFTTGG GGGGGFEEDD DEEEEMEEES ---------- --YPPA--GA AD-------- ---------- -VVVAAALLA FCCAAPPPAA GGDVADGDDC CTHNPDYVGP PDPFDDPPPS LLSSVVSDDL NIIITSSPPL WSSSSPPPSP GGDVADGDDC CTHNPDYVGP PDPFDDPPPS LLSSVVSDDL NIIITSSPPL WSSSSPPPSP AVVAAMEEEE EEVGGGII-- -------LLL MSSSCAIEAA DDDDHASSAQ SHAAAVVAAS PLLSSPEDDP PETDDDSSDD DFFDEEPLLI YDDDCAISDD DDDDPNSSKT IRSSELLDPT PLLSSPEDDP PETDDDSSDD DFFDEEPLLI YDDDCAISDD DDDDPNSSKT IRSSELLDPT GIGGGAAVHH FFTASRRFSV VAPTDDAL-- -YFFECCLKK AFAAAANNQQ IIEEHGGGCD --EEEAAFYY FFTASNRSSA ATSSSSDILS SYLLDCCSKK ALAAAANNQQ IIEEEKKKSN --EEEAAFYY FFTASNRSSA ATSSSSDILS SYLLDCCSKK ALAAAANNQQ IIEEEKKKSN DDHVVIDFSS SSLLMMQGGG LAQQALALPG P---LIIGGP PPPPTTRREE --RVGLAAAA NNHIIVDFGG GGIIVVQGGG IAQQALATTS PQQQIVVGPA PLLLGGSSEE PSITGNRRRR NNHIIVDFGG GGIIVVQGGG IAQQALATTS PQQQIVVGPA PLLLGGSSEE PSITGNRRRR ADDDDAAVRF FFRGANNNSD DERRPLQQIA AAPPAVVAFF NQHHLLGDDD DAD---IIDA RDDDDAALNF FFIPL---PH HLNNGFRRVD DDPPVLLAVV NQYYLL---- --DTTIVVDT RDDDDAALNF FFIPL---PH HLNNGFRRVD DDPPVLLAVV NQYYLL---- --DTTIVVDT AVVVLDCCCV SVRPFTIIIQ DDHHKTTDRF TEFFYYYYAA DSSSSSAGGG NE--AYYYLR TAAALRLLLA SLNPVTGGGY SSLLRVVNRV KNQQFFFYAA ESSLLLGDSE RERRELLLFR TAAALRLLLA SLNPVTGGGY SSLLRVVNRV KNQQFFFYAA ESSLLLGDSE RERRELLLFR EEIICIVVGG GA--RRHEEP LLLRRRRRRD DRLTAAGSSA VVPLGSSALR QAARMVLG-- RRIISLIIPK KTIIRRMEEE KKKQQQRRRV VLMEAAGEES VVKLSNNAVS QAAKILNNLL RRIISLIIPK KTIIRRMEEE KKKQQQRRRV VLMEAAGEES VVKLSNNAVS QAAKILNNLL ----HHHHHS VVEADCLLTT TLGGWHHHHG GGPLLLSSAA WWWWEAAAGD DDDDGGGGGG LLYYSSSSSI VVEKPFIISS SLAAWNNNND DDPLLLTTSS WWWWR----- ---------- LLYYSSSSSI VVEKPFIISS SLAAWNNNND DDPLLLTTSS WWWWR----- ---------- GGNNNNNSSS SSSNSSGSSS DSSGGKSSGG DDDDSSSLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -----MDDTF FFQQQ--PPA t5g66770.1 MMYYMCCCCC TTTGNLMMMA QIIKQQKKKQ QQQEEQQQQQ QQQDDHQQIF IINNNLNPTS t5g66770.2 MMYYMCCCCC TTTGNLMMMA QIIKKKKKKQ QQQEEQQQQQ QQQDDHQQIF IINNNLNPTS AAAASGDAAA GFLPPPPPAA A--------- ---------- PDDDGG---- ---------- SLLLFGSGGG SAFDDDDPFQ QTTGGGGDSD GGPNDDHHHH TTTTGGFGGT GGGEFEWELL SLLLFGSGGG SAFPPPPPFQ QTTGGGGDSD GGPNDDHHHH TTTTGGFGGT GGGEFEWELL ---------- ---------- -----YYPPP PPAA-----G AA-------- ---------- IGGGGDVVVA DDDDGPDDCD DTTTWHHNNN NPDDYIIYYG PPFFDDTTPP RLLQQQPLLL IGGGGDVVVA DDDDGPDDCD DTTTWHHNNN NPDDYIIYYG PPFFDDTTPP RLLQQQPLLL --VDDAAPPE AAAAAAAPAP AAAAVLLLL- --RREERR-- ----LLVHHL LLAAAGAEAA NNIDDSSLLP PTTTLLLPSP PPPPLTTTTH EETTEEEEDD DFEPLLLKKI IIAAA-RSDD NNIDDSSLLP PTTTLLLPSP PPPPLTTTTH EETTEEEEDD DFEPLLLKKI IIAAA-RSDD GHLLLLQQAA DSSHHALAAS SSAAASGGGR RVVVAVHFTA LLRLFFPPVV AATAAAAFF- SPEEEETTLL QIIRREVSEG GGDPPT---R RVVVAFYFEA LLRLSSPPAA TTSSDDDLLS SPEEEETTLL QIIRREVSEG GGDPPT---R RVVVAFYFEA LLRLSSPPAA TTSSDDDLLS --HHEEAACP YLAHFFTAAN NAAEGGHHHV VDFSSLLQPL QQALLLLGGP PPF--LRIIG SSKTDDAACP YSAHLLTAAN NAAEKKKKHI IDFGGIIQPL QQALLTTSGK PPTQQIRVVG SSKTDDAACP YSAHLLTAAN NAAEKKKKHI IDFGGIIQPL QQALLTTSGK PPTQQIRVVG GIIIPSSSPT RDDE---LDG GLRLAADLAR RSVRRRRFFS FFRGAANLLV PPPMLQIIAA GIIIPSSSLG SPPEPPSLAG GNRLRRDFAK KVLDDNNFFD FFIPLL-IIL GGGSFRVVDD GIIIPSSSLG SPPEPPSLAG GNRLRRDFAK KVLDDNNFFD FFIPLL-IIL GGGSFRVVDD PPGEFFSVRR LLLAAAADAP PP--IAVLDC VVVASVVRRI FVIIEEQQEE EEAAFLDRFE PPDEVVFMKK LLL----DTP PPTIVTALRL AAAKSLLNNV VLGGEEYYEE EEVVFANRVN PPDEVVFMKK LLL----DTP PPTIVTALRL AAAKSLLNNV VLGGEEYYEE EEVVFANRVN AAAYVVVFFD DSLDDAAASS SSSSGGGGGG AAAMAAYYQE ICCDDVGEEE GGGA-RRHEP AAAYVVVFFE ESLEEPPNLL LLLRDDDSSS EVVRVVLLGR ISSGGIPEEE KKKTIRRMEE AAAYVVVFFE ESLEEPPNLL LLLRDDDSSS EVVRVVLLGR ISSGGIPEEE KKKTIRRMEE PLLWRDRTTL LVPLSSNNLL LAMLSSGESS DCCLLLGWHR RPSAAWWWWE EEAAAADDGG EKKWRVLEEF FVKLNNYYVV VAINNNYSII PFFLLLAWNL LPTSSWWWWR RR-------- EKKWRVLEEF FVKLNNYYVV VAINNNYSII PFFLLLAWNL LPTSSWWWWR RR-------- GDNNNNSSSV GSGSNNNGSS NGKSSSSGGG DGGGSSSVL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --TFPFFQ-- -------WDA ASLDDAAGFL t5g66770.1 CTTSNLMMAQ VVQQQKQQQE EEEQHHHHHH QDIFGIINSL LLLLNPPWNS SFLSSGGSAF t5g66770.2 CTTSNLMMAQ VVKKKKQQQE EEEQHHHHHH QDIFGIINSL LLLLNPPWNS SFLSSGGSAF PAAA------ ---------- -APDDGY--- ---------- ---------- ---------- PFFQTGDDSD GGPPPPNNLH HATTTGFRRR LLSFFGGGEF SSEEWMMTTT TLIIISSGGG PFFQTGDDSD GGPPPPNNLH HATTTGFRRR LLSFFGGGEF SSEEWMMTTT TLIIISSGGG ---------- ---DPPPA-- --GGAADDD- ---------- -----VDDAL PPEFAAAFFP GDSSSVADPC DTTDPPPDYV VVGGPPDDDD DDYPSRLSVV PDDLNIDDTP LLPPPTLWWP GDSSSVADPC DTTDPPPDYV VVGGPPDDDD DDYPSRLSVV PDDLNIDDTP LLPPPTLWWP PPPPCAAAA- AAMMRREEEE EEIII----L HHLLMCGGEA DSSAAAAQQH HAAAAVSAAA PPPPSSPPPH SSPPKKEEDP EESSSFLEPL KKAAYC--SD DSSKKKKTTR RESSELGDDP PPPPSSPPPH SSPPKKEEDP EESSSFLEPL KKAAYC--SD DSSKKKKTTR RESSELGDDP GGIIRRRVVA AVHHHFLLLL FPPVPPTTAF -YYYHHHFEA CYYLLLKAAA HFFFTQQQQA ----RRRVVA AFYYYFLLLL SPPASSSSDL LYYYKKKLDA CYYSSSKAAA HLLLTQQQQA ----RRRVVA AFYYYFLLLL SPPASSSSDL LYYYKKKLDA CYYSSSKAAA HLLLTQQQQA AIFFHHHGGC DHHVDFFFSM GLWPAAAALL IIILLRPPPG PRTTGGGPPP PRDD---LLG AITTEEEKKS NKHIDFFFGV GIWPAAAALL LLLLTRTTTS PRSSPPPPPL LSPPSSSLLG AITTEEEKKS NKHIDFFFGV GIWPAAAALL LLLLTRTTTS PRSSPPPPPL LSPPSSSLLG RRLAAALLAA ARVRFSFFFF VNNNDDPPMM QAAAPEAAAV AAFFFFNVLQ LRLLGGPADQ RRLRRRFFAA AKLNFDFFFF I---HHGGSS RDDDPEVVVL AAVVVVNMLQ LKLL----DE RRLRRRFFAA AKLNFDFFFF I---HHGGSS RDDDPEVVVL AAVVVVNMLQ LKLL----DE QQQA---AAA LLDDCVVSSR KKIIIFTIEQ EEADHKTTTF DDTEALFYSS SSAAVFDDSL EEETTTITTT LLRRLAASSN RRVVVVTGEY EEVSLRVVVF NNKNALQYSS SSAAVFEESL EEETTTITTT LLRRLAASSN RRVVVVTGEY EEVSLRVVVF NNKNALQYSS SSAAVFEESL LDAAASSGGG NNAAMMMAAA AEEAAYYQRR RIICIVGGGA A-ERHHRWWR RRRLLTAAGG LEPNNRRSEE RRVVRRRVVV VEEEELLGRR RIISLIPPPT TIERMMQWWR RLLMMEAAGG LEPNNRRSEE RRVVRRRVVV VEEEELLGRR RIISLIPPPT TIERMMQWWR RLLMMEAAGG LSSSVPLGGS SAAAARMLLG SEE-HSSEEE EAGGCCCLGG WWHHRLLLLS ASSAAAWAAA FEEEVKLSSN NAAAAKILLW NSSLSIIESS SKGGFFFLAA WWNNLLLLLT LSSSSSW--- FEEEVKLSSN NAAAAKILLW NSSLSIIESS SKGGFFFLAA WWNNLLLLLT LSSSSSW--- AAGDGGGGDN NNNNNVSSGG SDSNNSSGKS GGRDGSVCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --MMTFFPPF F--------- WMDPAAASSG t5g66770.1 MYMTDDMIAQ QIIIIKKKQQ QQQEQQQQHH HDHHIFFGGI ILLSLLNNPP W-NTSLLGGG t5g66770.2 MYMTDDMIAQ QIIIIKKKQQ QQQEQQQQHH HDHHIFFGGI ILLSLLNNPP W-NTSLLGGG DDDAAGPAAA ---------- ---------- APDGGGG--- ---------- ---------- SSSGGSPFQQ GGGGGGGNDD PGFPFPLHHH ATTGGGGRLF GGGTTGGGGG FFEWWMMMTT SSSGGSPFQQ GGGGGGGNDD PGFPFPLHHH ATTGGGGRLF GGGTTGGGGG FFEWWMMMTT ---------- -------YPA -----G---- ---------- -------VAA LPPPEFAAAA TTLIGGVVAD PPCCDTWHPD VVVVIGDDYP SSRLLLSVQQ QPPDNRRITS PLLLPPTTTL TTLIGGVVAD PPCCDTWHPD VVVVIGDDYP SSRLLLSVQQ QPPDNRRITS PLLLPPTTTL AAPPAPAAAA AAAA--MREE EEEEEVV--- ---------- LCCGGGIIEG DDHHHALLSA LLPPSPSSSI PPPPHHPTEE DDPPETTDDD DDFDLLEEPP ICC---IISS DDPPPNEESK LLPPSPSSSI PPPPHHPTEE DDPPETTDDD DDFDLLEEPP ICC---IISS DDPPPNEESK LLADDSSHHA AAAVSSAAAA ASGRRAAAHR PSPPTTTTDE AFLL--FYYE EEACCKKFAF LLLQQIIRRS SEELGGDDDP PTERRAAAYN PSPSSSSSST DLIISSLNND DDACCKKFAL LLLQQIIRRS SEELGGDDDP PTERRAAAYN PSPSSSSSST DLIISSLNND DDACCKKFAL FTNQLLEAFF FCDVVVHHVV VIFMQGLLLP PALQALAG-- -LRGIIGPPP SSPTTR---- LTNQLLEATT TSNIIIHHII IVFVQGIIIP PALQALAGQQ QIRGIIPAPP SSLGGSPPPP LTNQLLEATT TSNIIIHHII IVFVQGIIIP PALQALAGQQ QIRGIIPAPP SSLGGSPPPP LRDDDVVGRL DLSSSVVVVV RFFFSSSRRR GGANNNNDDD VPPWWMLQAA GGEAAAAFSS LIAAATTGRL DFVVVLLLLL NFFFDDDIII PPT----HHH LGGSSSFRDD DDEVVAAVFF LIAAATTGRL DFVVVLLLLL NFFFDDDIII PPT----HHH LGGSSSFRDD DDEVVAAVFF VVVLLHRRLD PPPAQAPPDD VVLLDDCVRR PPVVVVEDHN TDDRTTTAAF FAADDLAASA MMMLLYKKL- ----ETPPDD AALLRRLANN PPLLLLESLN VNNRKKKAAQ QAAEELNGRE MMMLLYKKL- ----ETPPDD AALLRRLANN PPLLLLESLN VNNRKKKAAQ QAAEELNGRE GGNAAEELQQ QEEICCIVCC GGGGGAA-RR ERHHHEPPLW RRDDRRRLTR GLLAVPPPGG EERVVEEFGG GRRISSLIGG PPPPKGGIHR ERMMMEEEKW RRVVLLLMEN GFFSVKKKSS EERVVEEFGG GRRISSLIGG PPPPKGGIHR ERMMMEEEKW RRVVLLLMEN GFFSVKKKSS SLLRRAARRR MMLVVGFGEE ---VEEEEEL TTLLLGHPPS AAAEEAAADG GGGGGGGGDN NVVSSAAKKK IILLLWYYSS YYYVEESSSI SSLLLANPPT LLLRR----- ---------- NVVSSAAKKK IILLLWYYSS YYYVEESSSI SSLLLANPPT LLLRR----- ---------- NNNSSNNNSN NNVGSSSGSN SSSSSNGSSS GARDDSSCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------M DTTPQQ---- ---PPMDPAL t5g66770.1 MAAAAMMCCT SGGNNNLLAI IAAAQVVKKK QQQQQQQQQH QIIGNNSSLN NNNPP-NTLL t5g66770.2 MAAAAMMCCT SGGNNNLLAI IAAAQVVKKK QQQQQQQQQH QIIGNNSSLN NNNPP-NTLL LDAALLPPAA AA-------- ---------- AAADDDVVGG Y--------- ---------- LSGGFFPPFQ QQTTTGGSSN DGFNDHHHHH AAATTTGGGG FRSFGTGGGG GGEEDDDEWM LSGGFFPPFQ QQTTTGGSSN DGFNDHHHHH AAATTTGGGG FRSFGTGGGG GGEEDDDEWM ---------- ----YPP--- GD-------- ---------- -----DDAAP EEEFAAAPPP LISDSVAAAD GPDDHNNYVY GDPPFFFTTY YPPLVQPSDL LNNVVDDTSL PPPPPPTPPP LISDSVAAAD GPDDHNNYVY GDPPFFFTTY YPPLVQPSDL LNNVVDDTSL PPPPPPTPPP PAVVLL-AMM MREEEEAIII R--------- -LVLLSCGAA EEGDAASAAL AHAAAVSSAA PSLLTTHSPP PTEDPPNSSS EDDDFFFDDL LLLAADC-RR SSSDNASKKL LRSSSLGGDP PSLLTTHSPP PTEDPPNSSS EDDDFFFDDL LLLAADC-RR SSSDNASKKL LRSSSLGGDP GRVVAAVHHT TSRRRLLFPP PVPPTTTDAA EAFF--HHFE CPYLLKFAHH AQLGGCDHVV ERVVAAFYYT TSNNRLLSPP PASSSSSSSS TDLLLLKKLD CPYSSKFAHH AQLKKSNKII ERVVAAFYYT TSNNRLLSPP PASSSSSSSS TDLLLLKKLD CPYSSKFAHH AQLKKSNKII HIIFSQGLPP AIIAARGGPP FFLRIITIGP SSPPPPTTRR RDDE--LRVV GGRLLDDLLA HVVFGQGIPP ALLAARSGKP TTIRVVSIPA SSLLLLGGSS SPPEPSLITT GGRLLDDFFA HVVFGQGIPP ALLAARSGKP TTIRVVSIPA SSLLLLGGSS SPPEPSLITT GGRLLDDFFA RRRRRRRVVF FFVVANEERR PPMLLIIPGE AAAVFFFVLH HRGGDDADDD DQAP---IDD KDDDDDDLLF FFIIT-LLNN GGSFFVVPDE VVVLVVVMLY YK-----DDD DETPTIIVDD KDDDDDDLLF FFIIT-LLNN GGSFFVVPDE VVVLVVVMLY YK-----DDD DETPTIIVDD AALDCCVAAS VRRPKKIFTV QEDHHNNKTL DDDRFTTELF FYYSVVFSLL DAAASAAAGG TTLRLLAKKS LNNPRRVVTL YESLLNNRVA NNNRVKKNLQ QFYSVVFSLL EPNNLGGGDS TTLRLLAKKS LNNPRRVVTL YESLLNNRVA NNNRVKKNLQ QFYSVVFSLL EPNNLGGGDS GGAAGNNAAM AE--YLREEE IICEEAAARR RHEELLSSRR RRRDDDTTRR AGGLSVVPLG SSEEERRVVR VERRLFRRRR ILGEETGGHR RMEEKKEEQQ QRRVVVEENN AGGFEVVKLS SSEEERRVVR VERRLFRRRR ILGEETGGHR RMEEKKEEQQ QRRVVVEENN AGGFEVVKLS NRRQQRMLGL LFSGGEE--H HSEADDGCTL GGWHRFAAAW WEAAAGDDGG GNNNNNNSNS YSSQQKILWN NYNYYSSLLS SIEKPPGFSL AAWNLLLSSW WR-------- ---------- YSSQQKILWN NYNYYSSLLS SIEKPPGFSL AAWNLLLSSW WR-------- ---------- SNNNSSSSNN NNSSGGSGGS SGGAAGGSSS VVVVVCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- --------DF -------WWP MDDPAGLLFL PPAAAV---- t5g66770.1 MAAAAYDSLL MMAAVIKKQQ EEQQQQDDQI PPLSLLPWWP -NNTLGLLAF PPQQQVTGGG t5g66770.2 MAAAAYDSLL MMAAVIKKQQ EEQQQQDDQI PPLSLLPWWP -NNTLGLLAF PPQQQVTGGG ---------- ----PPDDDG VGGG------ ---------- ---------- ---------- GGNGFFFFNN DHHHTTTTTG GGGGRLLLLL SDFGGTFEDE WMMETLISGG GSVVADDDGP GGNGFFFFNN DHHHTTTTTG GGGGRLLLLL SDFGGTFEDE WMMETLISGG GSVVADDDGP ---YP----- --AD------ ---------- --VDAFFAAF FFPCCCAALL L--MMREEEE CDWHNYYYVV VVPDPPPFTY YSSRLQSSDD NVIDSPPTLW WWPSSSSITT THHPPKEDPP CDWHNYYYVV VVPDPPPFTY YSSRLQSSDD NVIDSPPTLW WWPSSSSITT THHPPKEDPP AGGGRR---- ------VHHM SAAGAAGGGD HALLAASAAA ADDDSHAAAL LLAAVVAAGG NDDDEEDDDD FFDEPPLKKY DAA-RDSSSD PNEEAASKKK LQQQIREEEV VVEELLDP-- NDDDEEDDDD FFDEPPLKKY DAA-RDSSSD PNEEAASKKK LQQQIREEEV VVEELLDP-- GGRRAFTTAL LLSRLLFPSP TTHHAFLLLY HHHHEECYLL LKFFFAAAHH HFFFFTAANQ -ERRAFTEAL LLSNLLSPSP SSEEDLIIIY KKTTDDCYSS SKFFFAAAHH HLLLLTAANQ -ERRAFTEAL LLSNLLSPSP SSEEDLIIIY KKTTDDCYSS SKFFFAAAHH HLLLLTAANQ AILLEEAFHH CCCHVVHHDF SMMMQGLQPL QQRPPP---L RITIIIGPPP SSPGGRDDEE AILLEEATEE SSSKIIHHDF GVVVQGIQPL QQRTPPQQQI RVSIIIPAAP SSLEESPPEE AILLEEATEE SSSKIIHHDF GVVVQGIQPL QQRTPPQQQI RVSIIIPAAP SSLEESPPEE ----DGRAAD DAASSVRRRS VNNSSLLDDV PPWWLIPPGG EEVVAAAAAF FQLLRRDDPP PPPSAGRRRD DAAVVLDNND I--PPIIHHL GGSSFVPPDD EELLAAAAAV VQLLKK---- PPPSAGRRRD DAAVVLDNND I--PPIIHHL GGSSFVPPDD EELLAAAAAV VQLLKK---- AADDQQA--I ILLCAASVRI FTVIEQQEAD DHNNGLLLRR TELFFYYYYS DSDASAASGG --DDEETTIV VLLLKKSLNV VTLGEYYEVS SLNNGAAARR KNLQQFFFYS ESENLGGRDS --DDEETTIV VLLLKKSLNV VTLGEYYEVS SLNNGAAARR KNLQQFFFYS ESENLGGRDS GGNMMEE-YL QRRREIDDII VVVVCAAA-R REERRRHPLL LLSRRRDDRR RRRAAGGLLS EERRREERLF GRRRRIGGLL IIIIGTTGIH REERRRMEKK KKEQQRVVLL LLNAAGGFFN EERRREERLF GRRRRIGGLL IIIIGTTGIH REERRRMEKK KKEQQRVVLL LLNAAGGFFN NALRQQAMLL LVVGGLFGEE EEG--HVVAG GGCCTLGWHR LFSWEEEEAA GGGGGNNNNN YAVSQQAILL LLLWWNYYSS SSNYYSVVKG GGFFSLAWNL LLTWRRRR-- ---------- YAVSQQAILL LLLWWNYYSS SSNYYSVVKG GGFFSLAWNL LLTWRRRR-- ---------- SSVGSSSSSS SSGSSDNNGG NNGGKSRGGG GSSSSSVVV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------DDDT TPQ--PDPAA ASSLLAPAAA t5g66770.1 MAAYMTSSGN NNLLMMAAAQ QVQQQKQQQQ QQQQQHQQQI IGNLNPNTSS LFFLLGPFFF t5g66770.2 MAAYMTSSGN NNLLMMAAAQ QVKKKKQQQQ QQQQQHQQQI IGNLNPNTSS LFFLLGPFFF AV-------- -------AAD DDDVVY---- ---------- ---------- ---------- QVGGDDNGGF FPPFFPDAAT TTTGGFRLDG GGTGGGGEEE SSDDEWEETT LIIGGDVAAG QVGGDDNGGF FPPFFPDAAT TTTGGFRLDG GGTGGGGEEE SSDDEWEETT LIIGGDVAAG ------YPP- --GGA----- ---------- -----DDAAL AAFCCAADAA AAAAAVL-AE GDCTWWHPPV IYGGPPPFDY YPSRLLSSVQ QPDRVDDSSP PLWSSSSLSI IPPPPLTESE GDCTWWHPPV IYGGPPPFDY YPSRLLSSVQ QPDRVDDSSP PLWSSSSLSI IPPPPLTESE EEEAGGGII- ---------L LLLMMMMCCC CAGGAIIIAG LALAAHHLLA AVAAAAIIIG EEENDDDSSD DDFFDDDLEL AAIYYYYCCC CA--RIIIDS EKLLLRRVVS ELDDDP---E EEENDDDSSD DDFFDDDLEL AAIYYYYCCC CA--RIIIDS EKLLLRRVVS ELDDDP---E GRVVFTTTTT LSRRRLLFFP SPAAAAAPPP DDAEEHHHAL ----YFFECP YYKKFAAHHF ERVFFTEEEE LSNRRLLSSP SPTTTTTSSS SSSTTEEEDI LLLSYLLDCP YYKKFAAHHL ERVFFTEEEE LSNRRLLSSP SPTTTTTSSS SSSTTEEEDI LLLSYLLDCP YYKKFAAHHL AAQIILEAAA FHHCDHHHHV IIDDFFSMMQ GLQQAAAALI IIALLAGPRI GPTGGEE--L AAQIILEAAA TEESNKHHHI VVDDFFGVVQ GIQQAAAALL LLALLAGKRI PPGEEEEPSL AAQIILEAAA TEESNKHHHI VVDDFFGVVQ GIQQAAAALL LLALLAGKRI PPGEEEEPSL RVGRLLADAS VRSFRRRVVA AANSLLDEER RRPPLLLPPG GEAAAAFFNS VVLQQQQQLL ITGRLLRDAV LNDFIIIIIL TT-PIIHLLN NNGGFFFPPD DEVVAAVVNF MMLQQQQQLL ITGRLLRDAV LNDFIIIIIL TT-PIIHLLN NNGGFFFPPD DEVVAAVVNF MMLQQQQQLL HLLPADD-ID DDALDDCCCA AAAVRPPPPK KKKTVVVEEQ QEEAADDHHH NKKKFFLDDR YLL--DDIVD DDTLRRLLLK KKKLNPPPPR RRRTLLLEEY YEEVVSSLLL NRRRFFANNR YLL--DDIVD DDTLRRLLLK KKKLNPPPPR RRRTLLLEEY YEEVVSSLLL NRRRFFANNR FFELLYYYVD SSSLLLDASA GGGGGNNME- ---AYIDDDV VCGGEGA--R ERHHSRRWWR VVNLLFYYVE SSSLLLENLG DDDEERRRER RRRELIGGGI IGPPEKGIIR ERMMEQQWWR VVNLLFYYVE SSSLLLENLG DDDEERRRER RRRELIGGGI IGPPEKGIIR ERMMEQQWWR DRLTAGLLAV PPGGSALRAR RLFSG----- EAAGGCTLGW WWHHGGLSSA EAAAAGGGGD VLMEAGFFSV KKSSNAVSAK KNYNYLLLYY EKKGGFSLAW WWNNDDLTTS R--------- VLMEAGFFSV KKSSNAVSAK KNYNYLLLYY EKKGGFSLAW WWNNDDLTTS R--------- DDDNNNNSSS VVSSSDSNNS SGSGGKSSSS AARDDGVLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------F QQ------WM DPAASSGDAF FPPPAAAAV- t5g66770.1 MAAYYTTDSS GGNNLLAQVV KQQQQQQDDF NNPSSNNPW- NTSLGFGSGA APPDFFQQVT t5g66770.2 MAAYYTTDSS GGNNLLAQVV KQQQQQQDDF NNPSSNNPW- NTSLGFGSGA APPPFFQQVT ---------- ----DGGYY- ---------- ---------- ---------- -----YYDDA GGSSSSNDGP NLHHTGGFFD FGGGGTGEEE SSDEWWMETT ISGGGDSSVV GCCWWHHDDD GGSSSSNDGP NLHHTGGFFD FGGGGTGEEE SSDEWWMETT ISGGGDSSVV GCCWWHHDDD ---GA----- ---------- ------VVDD DAAAPEEEFF AAAPPPAPAA AVLLL--EEE YIYGPFFFDT SSLSSSSVQP PPSLNRIIDD DTSSLPPPPP PPLPPPSPSI PLTTTHEEDD YIYGPFFFDT SSLSSSSVQP PPSLNRIIDD DTSSLPPPPP PPLPPPSPSI PLTTTHEEDD EEGIR----- ---------- -LVHLLLMMS AAAAGGAAGA AAADDSHAAA AAAAAAAAAS PEDSEDDDDF DLLLEEPPPP PLLKAAIYYD AAAA--DDSA KLLQQIRESS EEDDDDDPPT PEDSEDDDDF DLLLEEPPPP PLLKAAIYYD AAAA--DDSA KLLQQIRESS EEDDDDDPPT GGGIIGRRVA VFTASSSRRL FFP--SSVAA AAPPPPPPDA EEEEHHF--- -HHYYACPPK -----ERRVA FFTASSSNRL SSPNNSSATT TTSSSSSSSS TTTTEELLLS SKTNNACPPK -----ERRVA FFTASSSNRL SSPNNSSATT TTSSSSSSSS TTTTEELLLS SKTNNACPPK KKFFAAANQQ QQAIIILLEA AAGCHHVVII DDDLLMQGQP AAIIQQALLP PF---RIITG KKFFAAANQQ QQAIIILLEA AAKSKHIIVV DDDIIVQGQP AALLQQATTT PTQQQRVVSG KKFFAAANQQ QQAIIILLEA AAKSKHIIVV DDDIIVQGQP AALLQQATTT PTQQQRVVSG GIGGPPPPPG GRDE---DDR LLRRRRRFFS SFGGAAASLD ERRPPWWMML IIIIAGAVNN GIPPAAAPPE ESPEPPPAAR FFKDDDDFFD DFPPLTTPIH LNNGGSSSSF VVVVDDVLNN GIPPAAAPPE ESPEPPPAAR FFKDDDDFFD DFPPLTTPIH LNNGGSSSSF VVVVDDVLNN SVLLQQHGGD PPAAQAP--I IVLCCVASSS SVVRRRPIFF TTVVVIADKK KTGFFRFFTE FMLLQQY--- ----ETPTTV VALLLAKSSS SLLNNNPVVV TTLLLGVSRR RVGFFRVVKN FMLLQQY--- ----ETPTTV VALLLAKSSS SLLNNNPVVV TTLLLGVSRR RVGFFRVVKN LYYSSSAVFF DDDAAAAAGG ANNAAMMAE- AAAYYLLQQR RRECDIIVVV CGEEGA-RRE LFYSSSAVFF EEENNNNGDD ERRVVRRVER EEELLFFGGR RRRSGLLIII GPEEKGIHRE LFYSSSAVFF EEENNNNGDD ERRVVRRVER EEELLFFGGR RRRSGLLIII GPEEKGIHRE EESSSRRRLT RALLAVVPGS SNNARRRRRL GGSVEEEADD GCLTLLGGGH GRRPFSSAAA EEEEEQQLME NAFFSVVKSN NYYASKKKKN YNIVSSSKPP GFISLLAAAN DLLPLTTLLL EEEEEQQLME NAFFSVVKSN NYYASKKKKN YNIVSSSKPP GFISLLAAAN DLLPLTTLLL WEEEAGDNNN NSSNSVVSGS SSGGSDDSSS NSGSDSSSC WRRR------ ---------- ---------- --------- WRRR------ ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ----MMFFFF Q--------W PDPAASSGGL DDDDAAAGGG t5g66770.1 AMSGNLLMMM AAQKKKEEQQ QQQQHHFIII NPLLLLNNPW PNTLLGGGGL SSSSGGGSSS t5g66770.2 AMSGNLLMMM AAQKKKEEQQ QQQQHHFIII NPLLLLNNPW PNTLLGGGGL SSSSGGGSSS FPPPAAAA-- ---------- ---AAAPPPP DGVVY----- ---------- ---------- APDPFFQQTT GDDDGGFFNH HHHAAATTTT TGGGFRLLLS DDFGGGGGEE SSSDWWWMTL APPPFFQQTT GDDDGGFFNH HHHAAATTTT TGGGFRLLLS DDFGGGGGEE SSSDWWWMTL ---------- -------DDP PPA---GGGA DD-------- --------DA AAAAAAAALL ISGGGDVDDP PPCDTTWDDN NPDVVIGGGP DDPFFYRLLP PSSDLNVVDT TTTSSSSSPP ISGGGDVDDP PPCDTTWDDN NPDVVIGGGP DDPFFYRLLP PSSDLNVVDT TTTSSSSSPP EFACCAPAAA AAVL--AARE EEAGGR---- ----LLHLLS SCCAIIIGGH ASAAALLDSH PPTSSSPSII PPLTEESSKP EENDDEDFFF DLPPLLKAID DCCAIIISSP ASKKKLLQIR PPTSSSPSII PPLTEESSKP EENDDEDFFF DLPPLLKAID DCCAIIISSP ASKKKLLQIR AAAAAAASGI RVVVAAHTTS RRLLFP--VV PTTTTDDDDA EEAFFLLL-- -YYHHHHHHF EESSSPPT-- RVVVAAYTES RRLLSPNNAA SSSSSSSSSS TTDLLIIILS SYYKKTTTTL EESSSPPT-- RVVVAAYTES RRLLSPNNAA SSSSSSSSSS TTDLLIIILS SYYKKTTTTL FAAAYYLFAH TTNNAAILLL AFHCCDVVVI IDDFSSLLMM GGQWLIQAGG GPPPPIIITG LAAAYYSFAH TTNNAAILLL ATESSNIIIV VDDFGGIIVV GGQWLLQASS GPPPPVVVSG LAAAYYSFAH TTNNAAILLL ATESSNIIIV VDDFGGIIVV GGQWLLQASS GPPPPVVVSG GIPPPSSSSP PTRRD--LLR DDVLLLLLAS VRFFSSFRRV AANSLDDVRP PPWWMLLQQQ GIAPPSSSSL LGSSPPSLLI AATNNLLLRV LNFFDDFIII LT-PIHHLNG GGSSSFFRRR GIAPPSSSSL LGSSPPSLLI AATNNLLLRV LNFFDDFIII LT-PIHHLNG GGSSSFFRRR QIIPPPGGEA AANNVVVVLQ HHRGDQAA-I DAAAVLLVAR RPFTVEEEEE EAADHNKKKT RVVPPPDDEV AANNMMMMLQ YYK--ETTIV DTTTALLAKN NPVTLEEEEE EVVSLNRRRV RVVPPPDDEV AANNMMMMLQ YYK--ETTIV DTTTALLAKN NPVTLEEEEE EVVSLNRRRV TGFFFLRFFF FTTEAALLYS SAAVDDLLDD DAASAAGAAG NNMAEALLLE ICIGGEGAAA VGFFFARVVV VKKNAALLFS SAAVEELLEE ENNLGGSEEE RRRVEEFFFR ISLPPEKTTG VGFFFARVVV VKKNAALLFS SAAVEELLEE ENNLGGSEEE RRRVEEFFFR ISLPPEKTTG RHHEESSSRR RRALLAAAPL GSSNNLRRAV GE---HSVEE EAAADDGGGC CCLLTWWHPP RMMEEEEERR RNAFFSSSKL SNNYYVSSAL YSYYYSIVES SKKKPPGGGF FFIISWWNPP RMMEEEEERR RNAFFSSSKL SNNYYVSSAL YSYYYSIVES SKKKPPGGGF FFIISWWNPP PASWAADGGG DNNSSNNVSS SGSNNSGGSD GSSVCCCLL PLSW------ ---------- ---------- --------- PLSW------ ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------MMTF PPFQ------ ---WMDDASS t5g66770.1 AAYYMMTSNL MMMAIAQQVV IIKQQQQEQQ QQQQHDHHIF GGINLLLSSN NNPW-NNSGF t5g66770.2 AAYYMMTSNL MMMAIAQQVV IIKQQQQEQQ QQQQHDHHIF GGINLLLSSN NNPW-NNSGF SLLLAPPPPA V--------- ----APGG-- ---------- ---------- ---------- FLLLGPDPPF VGGGFPPPPP DHHHATGGRL SDDGGGEFES SDEMETSSSG GGGGDDDDSA FLLLGPPPPF VGGGFPPPPP DHHHATGGRL SDDGGGEFES SDEMETSSSG GGGGDDDDSA -------YYP A--------G GAADDD---- ---------- ------VAAA LLEFAAPADA DDGGDDTHHN DYVVIIIYYG GPPDDDFDTY SRLQQQPPSL NNRRVVITTS PPPPTLPSLS DDGGDDTHHN DYVVIIIYYG GPPDDDFDTY SRLQQQPPSL NNRRVVITTS PPPPTLPSLS AAV----AAA MRRRREEEAA R-------LV HHLLLLMAGE GGGLSSSSAA ALLAAHHAAA IPLHHEESSS PTTTKPPENN EDDFFDLPLL KKAAAIYA-S SSSESSSSKK KLLLLRRESS IPLHHEESSS PTTTKPPENN EDDFFDLPLL KKAAAIYA-S SSSESSSSKK KLLLLRRESS AAVAASGIII AAVHHFFTTT TAASSRRRFP --VAPDAHHH HFL-HHHFFF YEEACPYLLK SELDDT---- AAFYYFFTTT EAASSNRRSP NNATSSSEEE ELILKTTLLL NDDACPYSSK SELDDT---- AAFYYFFTTT EAASSNRRSP NNATSSSEEE ELILKTTLLL NDDACPYSSK KAHHHFANNQ AEAAAHGGDD DHHVHIIIDD DFFSSQGGQP PAALLIIIII QARRPGGGGP KAHHHLANNQ AEAAAEKKNN NKKIHVVVDD DFFGGQGGQP PAALLLLLLL QARRTSSGGK KAHHHLANNQ AEAAAEKKNN NKKIHVVVDD DFFGGQGGQP PAALLLLLLL QARRTSSGGK PLRIPPPPSS PPTGGRRRE- -VGLLRLSSR SGGGVNNSLE EEWMLQQQAG GGGVVANSSV KIRIAPPPSS LLGEESSSEP STGNNRLVVN DPPPI--PIL LLSSFRRRDD DDDLLANFFM KIRIAPPPSS LLGEESSSEP STGNNRLVVN DPPPI--PIL LLSSFRRRDD DDDLLANFFM QRLLLLLDAD QPPIIDDVLL DCVVAAASVR PKKIFAADNT FFFLLDDFEE AALYYSAFSL QKLLLLL--D EPPVVDDALL RLAAKKKSLN PRRVVVVSNV FFFAANNVNN AALFYSAFSL QKLLLLL--D EPPVVDDALL RLAAKKKSLN PRRVVVVSNV FFFAANNVNN AALFYSAFSL SSSSSGGGNA EE-YYYLQQR IIICCDIIVG GA-RRRRHHE PSRRRWWRRR LTRAAAVLLG LLLRRDSSRV EERLLLFGGR IIISSGLLIP KGIHHRRMME EEQQQWWRLL MENAASVLLS LLLRRDSSRV EERLLLFGGR IIISSGLLIP KGIHHRRMME EEQQQWWRLL MENAASVLLS GGGNQQALLL LLLSSGEGGG G--HSVEEAA DCCTTGGWGG GPLLSWWWWE EAAADDDGGG SSSYQQALLL LNNNNYSNNN NLYSIVESKK PFFSSAAWDD DPLLTWWWWR R--------- SSSYQQALLL LNNNNYSNNN NLYSIVESKK PFFSSAAWDD DPLLTWWWWR R--------- DDDNNNNNNS SNVSGGDDSS NGSSGGSSSS GRRDGGVVV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -DDDTTFFPP PF------MP PPSGLDAFFL t5g66770.1 AYMDSGNMAQ QQKKQQQQEQ QQQQQQQQHH QQQQIIFFGG GIPLSLLP-T TTGGLSGAAF t5g66770.2 AYMDSGNMAQ QQKKQQQQEQ QQQQQQQQHH QQQQIIFFGG GIPLSLLP-T TTGGLSGAAF PPPAV----- --------PD GGVGY----- ---------- ---------- ---------- DPPQVTGGGG GNGFNNLHTT GGGGFRLSFF FGGGEFFESS SDEWMEELLS GDSSVAAGCT PPPQVTGGGG GNGFNNLHTT GGGGFRLSFF FGGGEFFESS SDEWMEELLS GDSSVAAGCT ---YYYYDDP PPA------- GGA------- -----VVDDA ALLPPEEFAA AAFFPPPCAP TWWHHHHDDN NPDVVVVIIY GGPPFPSLSV VSDNRIIDDT SPPLLPPPPP TLWWPPPSSP TWWHHHHDDN NPDVVVVIIY GGPPFPSLSV VSDNRIIDDT SPPLLPPPPP TLWWPPPSSP PPDAAAAREE EEVGIIRRR- -------VHM SAAAAAGAEA AASAAQQAAD DSHHAAAAAV PPLIIPPKED DETDSSEEED DDFDLPPLKY DAAAAA-RSD NASKKTTLLQ QIRREESSEL PPLIIPPKED DETDSSEEED DDFDLPPLKY DAAAAA-RSD NASKKTTLLQ QIRREESSEL VVAASSIRRR VAVVVHHTAA SRRLPP-SAP TAAHHAFL-- HHHYYYPLLK KFAFTANEFH LLDPTT-RRR VAFFFYYTAA SNRLPPNSTS SSSEEDLILL KTTNNNPSSK KFALTANETE LLDPTT-RRR VAFFFYYTAA SNRLPPNSTS SSSEEDLILL KTTNNNPSSK KFALTANETE HHHHHHHIID DDSMQQQGLQ QQQWPAAAQA LRPPGGPPFF ITGGIIGPST RRDE---LDD EEKKKKKVVD DDGVQQQGIQ QQQWPAAAQA TRTTSGPPTT VSGGIIPASG SSPEPPPLAA EEKKKKKVVD DDGVQQQGIQ QQQWPAAAQA TRTTSGPPTT VSGGIIPASG SSPEPPPLAA VGGGRADDLL ARVRRRVVVV SSFRGVNNLL EVRRPPWWLL QIIIAGGEEA AAVFNSSLLQ TGGGRRDDFF AKLDDDLLLL DDFIPI--II LLNNGGSSFF RVVVDDDEEV VVLVNFFLLQ TGGGRRDDFF AKLDDDLLLL DDFIPI--II LLNNGGSSFF RVVVDDDEEV VVLVNFFLLQ QLLLRGGGPP AADAP----I IIIDVVVLVV SSSSSVPKII TTVIQQQQQE HKKKFLDRRF QLLLK----- --DTPTTTIV VVVDAAALAA SSSSSLPRVV TTLGYYYYYE LRRRFANRRQ QLLLK----- --DTPTTTIV VVVDAAALAA SSSSSLPRVV TTLGYYYYYE LRRRFANRRQ SAAFDDLLLL DDASGAAAGG GNNAAAE-YL QQEEIICEER RHLLSRWWWR RTRAALSAVP SAAFEELLLL EENRSEEEEE ERRVVVERLF GGRRIIGEER RMKKEQWWWR LENAAFESVK SAAFEELLLL EENRSEEEEE ERRVVVERLF GGRRIIGEER RMKKEQWWWR LENAAFESVK LGSSNLRRRR LFFFGE-SVV VEEEADCLLG WGGRRRPPLF AAAAWAAADD GGGGGGGGDN LSNNYVSKKK NYYYYSLIVV VEESKPFILA WDDLLLPPLL LLLSW----- ---------- LSNNYVSKKK NYYYYSLIVV VEESKPFILA WDDLLLPPLL LLLSW----- ---------- NNNNSSNVVV GSSGGGSSDG SSNGSSGGAA RRDSSSCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----MMDTFF FFQ---PPDD DDPPAAAAGG t5g66770.1 MDGNLAAQVV VVIIIKKQQQ QQQQQQQQQH QQQQHHQIFF FINPPPPPNN NNTTSLLLGG t5g66770.2 MDGNLAAQVV VVIIIKKQQQ QQQQQQQQQH QQQQHHQIFF FINPPPPPNN NNTTSLLLGG LAGGP----- ---------- --------AP PDDGVGY--- ---------- ---------- LGSSPGGGDS SDPGGPPFFF NNLDDHHHAT TTTGGGFRRS DGGGGGTGGG GEEDDDDEWM LGSSPGGGDS SDPGGPPFFF NNLDDHHHAT TTTGGGFRRS DGGGGGTGGG GEEDDDDEWM ---------- -DPP-ADD-- ---------- ------VVAA LLFFAAAAFP CAAADAAAAA LGGDGDDTTW WDNPVPDDPP FFSRRRRVQP SSDDRVIITS PPPPPPTTWP SSSSLSSSSI LGGDGDDTTW WDNPVPDDPP FFSRRRRVQP SSDDRVIITS PPPPPPTTWP SSSSLSSSSI VVLL-AMRRR EEAGGII--- -------VLL LCGGAAIEEA AAGDDAAAAA ASAAQADDSS LLTTESPTKK DPNDDSSDDD FDDDEEPLAA IC--RRISSD DDSDDNNNAA ASKKTLQQII LLTTESPTKK DPNDDSSDDD FDDDEEPLAA IC--RRISSD DDSDDNNNAA ASKKTLQQII HHAAAAAASS SGIIIGGGGR VVAAAFFTSR LF-SAPPPPP DDAAAAFFFL L--HHHHYEE RRSSEDDPTT T----EEEER VVAAAFFTSR LSNSTSSSSS SSSDDDLLLI ILSKTTTNDD RRSSEDDPTT T----EEEER VVAAAFFTSR LSNSTSSSSS SSSDDDLLLI ILSKTTTNDD AAPPPLLKAA HHFFNNNQAI ILAFFHGGDV HVIFSLMQQQ GGWPAALLLQ AALAAPGF-L AAPPPSSKAA HHLLNNNQAI ILATTEKKNI HIVFGIVQQQ GGWPAALLLQ AALAATSTQI AAPPPSSKAA HHLLNNNQAI ILATTEKKNI HIVFGIVQQQ GGWPAALLLQ AALAATSTQI RIITIGGPPS SPPRD---LL RDDLRLAADL LLAARSSSVV VRFSFRVVAA NSLEVPWMML RVVSIPPAPS SLLSPPSSLL IAANRLRRDF FFAAKVVVLL LNFDFIIILT -PILLGSSSF RVVSIPPAPS SLLSPPSSLL IAANRLRRDF FFAAKVVVLL LNFDFIIILT -PILLGSSSF QQAAPPPPGA AFSSVLHLLL GPDDAAPPVL LDCCVSSSVV RPPFEQEAAD DHHHNNKKTG RRDDPPPPDV AVFFMLYLLL --DDTTPPAL LRLLASSSLL NPPVEYEVVS SLLLNNRRVG RRDDPPPPDV AVFFMLYLLL --DDTTPPAL LRLLASSSLL NPPVEYEVVS SLLLNNRRVG FFLRRTAAFY FLAAASGGGA AAGNAMAYLQ RREEEIDIGG EGAA--EEER HHEELSSSSW FFARRKAAQF FLPGGRSSSE EEERVRVLFG RRRRRIGLPP EKTTIIEEER MMEEKEEEEW FFARRKAAQF FLPGGRSSSE EEERVRVLFG RRRRRIGLPP EKTTIIEEER MMEEKEEEEW WWRTRAALPP LLLAAARRAR MMVVVSEG-- HHHEEEEAGC CLLLTLLLHR RLLLLFFSSA WWLENAAFKK LLLAAASSAK IILLLNSNLL SSSEESSKGF FIIISLLLNL LLLLLLLTTL WWLENAAFKK LLLAAASSAK IILLLNSNLL SSSEESSKGF FIIISLLLNL LLLLLLLTTL SAAAAAGDNN NNNSSVSSSS NNSSSSNNGK KGRRGSSSL SSSSS----- ---------- ---------- --------- SSSSS----- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- --MD------ --WPPMDPPA t5g66770.1 MAAAYMCCTD SGLMMMIIIA QQQQQVIIQQ QQQQHHQQQQ QDHQLSSNNN PPWPP-NTTS t5g66770.2 MAAAYMCCTD SGLMMMIIIA QQQQQVIIQQ QQQQHHQQQQ QDHQLSSNNN PPWPP-NTTS AASSGDGFFF PPPAA----- ---------- --APDDDVVG ---------- ---------- SLFFGSSAAA PDDQQTTGGG NDPGGFPLDD DHATTTTGGG RRFGGGTGGG GEDDEWMMMM SLFFGSSAAA PPPQQTTGGG NDPGGFPLDD DHATTTTGGG RRFGGGTGGG GEDDEWMMMM ---------- ---------- ------YDPP P---GAA--- ---------- -----VAAAA ETSGGGGGDS SSVADDPPPD DTTTWWHDNP PYIIGPPPPF DTTTTPSQQQ PSDNRITSSS ETSGGGGGDS SSVADDPPPD DTTTWWHDNP PYIIGPPPPF DTTTTPSQQQ PSDNRITSSS PEFFPDAAAA AAAAVL--AM MRREEEEEGG IIR------- ----LLVVHL SCGGGAAAII LPPPPLSIPP PPPPLTEESP PTTDDPPPDD SSEDDFDEEE PPPPLLLLKA DC---RRRII LPPPPLSIPP PPPPLTEESP PTTDDPPPDD SSEDDFDEEE PPPPLLLLKA DC---RRRII HHHHALASSA QLLASHHALA VASIVFFTTA LLSF--SSVA PTTAAEHAAL ----YYHHYY PPPPNEASSK TLLLIRRSVE LDT-VFFTEA LLSSNNSSAT SSSSSTEDDI LLLSYYKTNN PPPPNEASSK TLLLIRRSVE LDT-VFFTEA LLSSNNSSAT SSSSSTEDDI LLLSYYKTNN EAAPPLKKKK AATTANQLLE HHCDDDHHVV ISSLQWPAII QAALLRGPPF LTGGGIGGGG DAAPPSKKKK AATTANQLLE EESNNNKKII VGGIQWPALL QAALLRSKKT ISGGGIPPPP DAAPPSKKKK AATTANQLLE EESNNNKKII VGGIQWPALL QAALLRSKKT ISGGGIPPPP PSSPTTRDDE E-RDDLLRLA DRVRRRRFFF RGGVVAAADE ERPWMMLQQQ EEEVVAAFFS PSSLGGSPPE EPIAANNRLR DKLDDDNFFF IPPIILLTHL LNGSSSFRRR EEELLAAVVF PSSLGGSPPE EPIAANNRLR DKLDDDNFFF IPPIILLTHL LNGSSSFRRR EEELLAAVVF LHLLLGDDAA AADQQQQAAA P---IVDVAV VRPPKIIIFF TTTVIIIIIE EEADDHTTGF LYLLL----- --DEEEETTT PTIIVARAKL LNPPRVVVVV TTTLGGGGGE EEVSSLVVGF LYLLL----- --DEEEETTT PTIIVARAKL LNPPRVVVVV TTTLGGGGGE EEVSSLVVGF DDDRFEALLF YYADSLLAAS SAGAAGNNNE -AQQQRRECC CCVVVCCRRH EPSSRWDRRL NNNRVNALLQ FFAESLLNNL LGSEEERRRE REGGGRRRSS SSIIIGGRRM EEEEQWVLLM NNNRVNALLQ FFAESLLNNL LGSEEERRRE REGGGRRRSS SSIIIGGRRM EEEEQWVLLM LGLLLPPSSN QARVGGLLSE EGGHVEEEAA ADGGCLGGGH GGRRSSSAAA SAAWAADGGD MGFFFKKNNY QAKLWWNNNS SNNSVEESKK KPGGFIAAAN DDLLTTTLLL SSSW------ MGFFFKKNNY QAKLWWNNNS SNNSVEESKK KPGGFIAAAN DDLLTTTLLL SSSW------ NNNNNNNNNS SSSDDSSNNN SSGGSNGKGG ARRDSVCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------M TPP-----MD PAAASSSDAL t5g66770.1 MMAMCCCTDN NLLLAIQKQQ QKKQEEQQQQ QQQQQQQQQH IGGPPLNN-N TSLLGGFSGF t5g66770.2 MMAMCCCTDN NLLLAIQKKK KKKQEEQQQQ QQQQQQQQQH IGGPPLNN-N TSLLGGFSGF LLPPAAVV-- ---------P PGY------- ---------- ---------- ---------- FFPPFFVVGG GDPFFFLLDT TGFRRLLLLS SSSDFGGGGG GEEFFFSSDD MLISGDDDDV FFPPFFVVGG GDPFFFLLDT TGFRRLLLLS SSSDFGGGGG GEEFFFSSDD MLISGDDDDV ---------- --YYDP---- AA-------- --VVDAAALP FAAAAAFFPC PPDDAAAAAA VVDDDGGPDD DDHHDPVVIY PPTYYRLPDD DLIIDSSSPL PTLLLLWWPS PPLLSSSSPP VVDDDGGPDD DDHHDPVVIY PPTYYRLPDD DLIIDSSSPL PTLLLLWWPS PPLLSSSSPP AAVL-AMMRR RREGIRRR-- ----LLHHLL LMSSCIEEEE EAAAASADAA AAAAVVVAAG PPLTESPPTT KKPDSEEEDF FLLELLKKAA IYDDCISSSS SDDNASLQEE ESSELLLDP- PPLTESPPTT KKPDSEEEDF FLLELLKKAA IYDDCISSSS SDDNASLQEE ESSELLLDP- GRVVHHTTAF PPSVVAPPPP PTTDDDDAEH HHFL--HHFY YYEEAAYLKF FAFTANQAAI -RFFYYTEAS PPSAATSSSS SSSSSSSSTE EELILSKKLN NNDDAAYSKF FALTANQAAI -RFFYYTEAS PPSAATSSSS SSSSSSSSTE EELILSKKLN NNDDAAYSKF FALTANQAAI AAAFHGGGDD DDDVHHVIDD DFFSLMQQGG LQWWALIQQA AALAAAARR- -LITGGGPPP AAATEKKKNN NNNIHHIVDD DFFGIVQQGG IQWWALLQQA AALAAAARRQ QIVSPPPAAA AAATEKKKNN NNNIHHIVDD DFFGIVQQGG IQWWALLQQA AALAAAARRQ QIVSPPPAAA PTGRRRRDE- --LRDVGRLA DDDARSSVFS SRRRAANNSL LLEEEVRRPW WQQQIAAPGA LGESSSSPEP SSLIATGRLR DDDAKVVLFD DIIILL--PI IILLLLNNGS SRRRVDDPDA LGESSSSPEP SSLIATGRLR DDDAKVVLFD DIIILL--PI IILLLLNNGS SRRRVDDPDA AFFNVVQLRR LLLGDPADDQ QAPPP---ID DDAACCVASV RRPVVVVIIE EAADHNNKTG AVVNMMQLKK LLL----DDE ETPPPTTIVD DDTTLLAKSL NNPLLLLGGE EVVSLNNRVG AVVNMMQLKK LLL----DDE ETPPPTTIVD DDTTLLAKSL NNPLLLLGGE EVVSLNNRVG DRRRRFFFLL LFYYYSAVFD LAAAGGAAGG GAAAE-AYYQ QQRREDDIVC GGG-RERRRH NRRRRVVVLL LQFYYSAVFE LNNGDDEEEE EVVVERELLG GGRRRGGLIG PPKIRERRRM NRRRRVVVLL LQFYYSAVFE LNNGDDEEEE EVVVERELLG GGRRRGGLIG PPKIRERRRM HEPWWRRRRL LLSNLLLLLG E-HVEAADDD GCLTTTTTLL GGWHGRPFFA ASWWWWEAAA MEEWWRRRLL LLNYVLLNNY SYSVSKKPPP GFISSSSSLL AAWNDLPLLL LSWWWWR--- MEEWWRRRLL LLNYVLLNNY SYSVSKKPPP GFISSSSSLL AAWNDLPLLL LSWWWWR--- GDDNNNNNNN SSSSSSGNSG GGSGSSGGRD DGSSSSCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --------MT FPFQ------ -------AAS t5g66770.1 MMMYMCCCTT DSGLMMAAAA IQQVVVKQQK QQQQQQQHHI FGINPLLSSS LLLNNPPSSG t5g66770.2 MMMYMCCCTT DSGLMMAAAA IQQVVVKKKK QQQQQQQHHI FGINPLLSSS LLLNNPPSSG SSGGLDAGGG PVV------- ---------- -------DGG VY-------- ---------- FFGGLSGSSS PVVTTGGGGD SDDFFPFFFN NNDHHHHTGG GFLDFFGGTG GGGGGFEDDE FFGGLSGSSS PVVTTGGGGD SDDFFPFFFN NNDHHHHTGG GFLDFFGGTG GGGGGFEDDE ---------- ---------Y PAA---GGDD DD-------- ---------- AAAAEFAAAF WEELSGDDSS AADCDDTTTH PDDVIYGGDD DDPPFFTYYP PSSRLVQQSL SSSSPPPPLW WEELSGDDSS AADCDDTTTH PDDVIYGGDD DDPPFFTYYP PSSRLVQQSL SSSSPPPPLW FAAAAVL--- AARRRREEEE EEEEVAIIIR -------LVV HLLMCCAAAA AAAADHHHAS WSSSPLTEEE SSTTKKEEDP PEEETNSSSE DDFDDDELLL KIIYCCRDDD DDDDDPPPNS WSSSPLTEEE SSTTKKEEDP PEEETNSSSE DDFDDDELLL KIIYCCRDDD DDDDDPPPNS SAQLAADALA AVAASGGGRR RVAVVVHFTT LSRRRLFFPP PPPTDDEEFL L-YFYYEPYK SKTLLLQEVS ELDPT--ERR RVAFFFYFEE LSNNRLSSPP SSSSSSTTLI ISYLNNDPYK SKTLLLQEVS ELDPT--ERR RVAFFFYFEE LSNNRLSSPP SSSSSSTTLI ISYLNNDPYK FAFAANNQLL EEEAAFFGGD DHHHVVVDDD FLMMQGQQPP ALLIQALLLG GGPPF--LIT FALAANNQLL EEEAATTKKN NKKKIIIDDD FIVVQGQQPP ALLLQALLTG GGKPTQQIVS FALAANNQLL EEEAATTKKN NKKKIIIDDD FIVVQGQQPP ALLLQALLTG GGKPTQQIVS GIIGGPPPPG DDEE--LRRD VVLLLLRRRR RLDLRFFSFV SSSEEVLLQA AAPGGEAAAS GIIPPAPLLE PPEESSLIIA TTNNNNRRRR RLDFDFFDFI PPPLLLFFRD DDPDDEVAAF GIIPPAPLLE PPEESSLIIA TTNNNNRRRR RLDFDFFDFI PPPLLLFFRD DDPDDEVAAF VVLLLLLRLL LGPDAP---- -IAAAVLDCC VSVVRRPIIF TEEADTGLDD DRTTALYYSA MMLLLLLKLL L--DTPTTTI IVTTTALRLL ASLLNNPVVV TEEVSVGANN NRKKALYYSA MMLLLLLKLL L--DTPTTTI IVTTTALRLL ASLLNNPVVV TEEVSVGANN NRKKALYYSA AVFFSLLDAA ASAASSGGGA ANMAAAE-AY LLRREEEDIV GAAAA-ERRE PLRRDLLTTR AVFFSLLEPP NLGGRRDDSE ERRVVVEREL FFRRRRRGLI KTTTGIERRE EKQQVMMEEN AVFFSLLEPP NLGGRRDDSE ERRVVVEREL FFRRRRRGLI KTTTGIERRE EKQQVMMEEN LSSSPLSNAL LRALLGGLFF FSSSGGEEGG ---EADCTTL LGWGPPLLFS SSASAADDGG FEEEKLNYAV VSALLWWNYY YNNNYYSSNN YYYSKPFSSL LAWDPPLLLT TTLS------ FEEEKLNYAV VSALLWWNYY YNNNYYSSNN YYYSKPFSSL LAWDPPLLLT TTLS------ GGDNNSNSSS NSSSSSNNNN GSSGKKKGAA GGSSVCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------MDDD TTPFFF---- --WWMPASSS t5g66770.1 MMAMMCTTTT DDGNNLMMAA QQQEQQQQQQ HHHHQQHQQQ IIGIIIPPLL LNWW-TSGFF t5g66770.2 MMAMMCTTTT DDGNNLMMAA QKQEQQQQQQ HHHHQQHQQQ IIGIIIPPLL LNWW-TSGFF SGAGFLPPPP AAV------- ---------- --------PP GVV------- ---------- FGGSAFPPPP QQVGGGGGGG DDSSSSDDDF FNNLLHHHTT GGGSFFGTTT GGGGGFFESD FGGSAFPPPP QQVGGGGGGG DDSSSSDDDF FNNLLHHHTT GGGSFFGTTT GGGGGFFESD ---------- -------YYD DPPA--GGAA DD-------- ---------- -DAAPAAAAF WTTLISGSVP PCDDTWWHHD DNNDYIGGPP DDPFFFTTYS RLSSSVSLLL VDSSLTTTLW WTTLISGSVP PCDDTWWHHD DNNDYIGGPP DDPFFFTTYS RLSSSVSLLL VDSSLTTTLW PCCCCAPPPD DAAAA-ARRE EEEEEEEVGI IIRR------ -VLLLLLMMM SCAAAIIEAH PSSSSSPPPL LPPPPHSKKE EEDDEEETDS SSEEDFDLLE PLAIIIIYYY DCARRIISDP PSSSSSPPPL LPPPPHSKKE EEDDEEETDS SSEEDFDLLE PLAIIIIYYY DCARRIISDP AALAAQSSHH AAAVSSASGR VAFALSRRRR SSSSPPPPPP PPTTTDDHAF --YHHHYCPP NNEAKTIIRR SSSLGGDTER VAFALSNNNR SSSSPPSSSS SSSSSSSEDL LSYKTTNCPP NNEAKTIIRR SSSLGGDTER VAFALSNNNR SSSSPPSSSS SSSSSSSEDL LSYKTTNCPP PPYYLLLAAF ANLEFDHVVH VVIDDFSSSL LMMGGLLQQW PLLLIRPGGG GPP--LLRRI PPYYSSSAAL ANLETNKIIH IIVDDFGGGI IVVGGIIQQW PLLLLRTSGG GKPQQIIRRV PPYYSSSAAL ANLETNKIIH IIVDDFGGGI IVVGGIIQQW PLLLLRTSGG GKPQQIIRRV TGGPPTGGRR EELDDVLLLR DDDASVVVRV VFFFSSRRGA ANRPWWMMML LIAAAPAVVA SGGALGEESS EELAATNNNR DDDAVLLLDL LFFFDDIIPL L-NGSSSSSF FVDDDPVLLA SGGALGEESS EELAATNNNR DDDAVLLLDL LFFFDDIIPL L-NGSSSSSF FVDDDPVLLA ANSSSSSVVL QLHRLLLDDQ AAAP--IIVL CVAAASSPKI EEEEADDNTT GLDRREAFYY ANFFFFFMML QLYKLLL-DE TTTPTIVVAL LAKKKSSPRG EEEEVSSNVV GANRRNAQFY ANFFFFFMML QLYKLLL-DE TTTPTIVVAL LAKKKSSPRG EEEEVSSNVV GANRRNAQFY SVVDDSSSDA ASSSGAAMMA AYLQQRDICE AARREERRRR HEPPPSRWWR DDDLTRGLSS SVVEESSSEP GRRRSVVRRV ELFGGRGLGE TTRREERRRR MEEEEEQWWR VVVMENGFEE SVVEESSSEP GRRRSVVRRV ELFGGRGLGE TTRREERRRR MEEEEEQWWR VVVMENGFEE AVLGNNRRRQ ARRRMLLLLG GLLFGGE--H HSECLLTLLG WWHGRPFFFA SWEEAAGDGG SVLSYYSSSQ AKKKILLLLW WNNYYYSLYS SIEFIISLLA WWNDLPLLLL SWRR------ SVLSYYSSSQ AKKKILLLLW WNNYYYSLYS SIEFIISLLA WWNDLPLLLL SWRR------ DDNNNSNNSS VSSSSGGGGS SDSSSGSSSN GGKKSRDDG ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------MMMT PPFF-----W PPMDAASGGD t5g66770.1 MAMCTTTGGG GNLAIAAAQQ QQKKQQEQQQ QHHHQDHHHI GGIIPLLPPW PP-NSLGGGS t5g66770.2 MAMCTTTGGG GNLAIAAAQQ QQKKKQEQQQ QHHHQDHHHI GGIIPLLPPW PP-NSLGGGS AAGFFPPPPA V--------- ---------- ------DDDG GGGGY----- ---------- GGSAAPPPPF VTTTGGGDDS PPPPPGFFFP PNDHHHTTTG GGGGFRLDGG GTTGGESDEW GGSAAPPPPF VTTTGGGDDS PPPPPGFFFP PNDHHHTTTG GGGGFRLDGG GTTGGESDEW ---------- -------DPP AA-----D-- ---------- ---------- --AAAPPEFF WLISSSGGDS SVDDPDWDNP DDYYVIIDDP PSRRRLLSVV VQQQQPPPSL NRTTTLLPPP WLISSSGGDS SVDDPDWDNP DDYYVIIDDP PSRRRLLSVV VQQQQPPPSL NRTTTLLPPP AAFPCAPAAV VL-AARRRRE EEEEVAGRRR ---------- -LHLLLLLMC CCGGAADAAL PPWPSSPSSL LTESSTKKKE PPPETNDEEE DDDDDFLEPP PLKAIIIIYC CC--RDDNNE PPWPSSPSSL LTESSTKKKE PPPETNDEEE DDDDDFLEPP PLKAIIIIYC CC--RDDNNE SLASSHHAAA LLAASSAASI VAAVVFLSSL LFPPVAPPTT TDDAHAFFL- -HYEAAAPPY SLLIIRREEE VVSEGGDPT- VAAFFFLSSL LSPPATSSSS SSSSEDLLIL LTNDAAAPPY SLLIIRREEE VVSEGGDPT- VAAFFFLSSL LSPPATSSSS SSSSEDLLIL LTNDAAAPPY KFFAHHFTQA AIILHGCCDH HVVSLMQGLW WPLLIQLLRR PGGGPPPLTT GIIGPPPSDD KFFAHHLTQA AIILEKSSNK KIIGIVQGIW WPLLLQTTRR TSGGKKKISS GIIPAAPSPP KFFAHHLTQA AIILEKSSNK KIIGIVQGIW WPLLLQTTRR TSGGKKKISS GIIPAAPSPP EEEE---LVG LRDLLAAARR RRVVVRRVFS FFRVAAANDE VPMLLGGAAN NNNSSVHHRL EEEEPPPLTG NRDFFAAAKK KKLLLDDLFD FFIITTT-HL LGSFFDDVAN NNNFFMYYKL EEEEPPPLTG NRDFFAAAKK KKLLLDDLFD FFIITTT-HL LGSFFDDVAN NNNFFMYYKL LLLLGGGDPA DAIDAAAAAA RPKIVVVVII EQEDDDHNKK KKTTTTTGLD RFFFEAAYYY LLLL------ DTVDTKKKKK NPRVLLLLGG EYESSSLNRR RRVVVVVGAN RVVVNAAFYY LLLL------ DTVDTKKKKK NPRVLLLLGG EYESSSLNRR RRVVVVVGAN RVVVNAAFYY AVDDSDDASS GGGAGNNAAM MMAE----QI ICCICGGGGA AA--RERRHE PLRRRLLLTR AVEESEEPLR DDDEERRVVR RRVERRRRGI ISSLGPPKKT GGIIRERRME EKQRRMMMEN AVEESEEPLR DDDEERRVVR RRVERRRRGI ISSLGPPKKT GGIIRERRME EKQRRMMMEN RGAVLLLLGN AALLLAARRM MMLGGGGFGH HSEGGCCLLW HGRPLFFSWW WWWAGGGDGG NGSVLLLLSY AAVVVAAKKI IILWWWWYNS SISGGFFILW NDLPLLLTWW WWW------- NGSVLLLLSY AAVVVAAKKI IILWWWWYNS SISGGFFILW NDLPLLLTWW WWW------- GDDNNNSSSN VSSSDDSSSS NSGGSSSGKS ARDDGSCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -MTFFFPPQQ ---WPPASSL t5g66770.1 MMYDSSGGNL LMMAAAQQQQ QQKQQQKQQQ QQQEQQQQQH QHIFFFGGNN LLPWTTLGFL t5g66770.2 MMYDSSGGNL LMMAAAQQQQ QQKKKKKQQQ QQQEQQQQQH QHIFFFGGNN LLPWTTLGFL LAGGFLLPAA V--------- ---------A ADGGG----- ---------- ---------- LGSSAFFPFQ VTTTTGGDSD GGFPFPDDHA ATGGGRLSFF GTTGFFFSEE LISGGDSAAG LGSSAFFPFQ VTTTTGGDSD GGFPFPDDHA ATGGGRLSFF GTTGFFFSEE LISGGDSAAG --DPPPP--- --GAA----- ---------- -----VDDDA AAALLPFFAA AAAAFFPPPC PTDNNPPVVI YYGPPPFDTY YSSSLLLSVQ LLLNRIDDDT SSSPPLPPPT TTLLWWPPPS PTDNNPPVVI YYGPPPFDTY YSSSLLLSVQ LLLNRIDDDT SSSPPLPPPT TTLLWWPPPS CAAPPDDAAV -REEEEAAAG R--------- --LLHHHLLL LCCAGAGGDH HALSAQLASS SSSPPLLSSL HKEDPENNND EDDDDDLLEE EPLLKKKAII ICCA-RSSDP PNESKTLLII SSSPPLLSSL HKEDPENNND EDDDDDLLEE EPLLKKKAII ICCA-RSSDP PNESKTLLII HAAAAALLAV VSSAIIIGGR RVAVVTTAAL RRRRRLFFP- PPPADAAL-Y YYHHACPPYF REEESSVVSL LGGP---EER RVAFFTEAAL NRRRRLSSPN PPPTSSDILY YYTTACPPYF REEESSVVSL LGGP---EER RVAFFTEAAL NRRRRLSSPN PPPTSSDILY YYTTACPPYF FHHTAANQAI EEFHHHHGCV HFLLLQQLLQ PLIIALALLP GGPPFFFRRI GGIGPPSSSG FHHTAANQAI EETEEEEKSI HFIIIQQIIQ PLLLALATTT SSKPTTTRRV GGIPAASSSE FHHTAANQAI EETEEEEKSI HFIIIQQIIQ PLLLALATTT SSKPTTTRRV GGIPAASSSE GGRDD----L RRDGLLRRRL ADDAARSVRR RFSSRGVAAN EVMLIAPGGG EEVVVLQQHL EESPPPPPPL IIAGNNRRRL RDDAAKVLDN NFDDIPILT- LLSFVDPDDD EELLMLQQYL EESPPPPPPL IIAGNNRRRL RDDAAKVLDN NFDDIPILT- LLSFVDPDDD EELLMLQQYL GGDPADQQQA AAP-AAVVCC VAVVPPPPIT VVEAAAAADH HNNKKTGGFD DDRFFALFYS -----DEEET TTPTTTAALL AKLLPPPPVT LLEVVVVVSL LNNRRVGGFN NNRVVALQYS -----DEEET TTPTTTAALL AKLLPPPPVT LLEVVVVVSL LNNRRVGGFN NNRVVALQYS AAASGGGNAA -AAYLRRRRC CCDVVVEEGG GAAA---RRR REEERHEEPL LRWWRLTTTA PPNRDEERVV REELFRRRRS SSGIIIEEKK KTTGIIIHHH REEERMEEEK KQWWLMEEEA PPNRDEERVV REELFRRRRS SSGIIIEEKK KTTGIIIHHH REEERMEEEK KQWWLMEEEA AGAAAPGGGA AALLLLRRQR VE---HHSSE EAAALTGGGW WHHHGRPPSS SASSSWAGGG AGSSSKSSSA AAVVVVSSQK LSLLYSSIIE SKKKISAAAW WNNNDLPPTT TLSSSW---- AGSSSKSSSA AAVVVVSSQK LSLLYSSIIE SKKKISAAAW WNNNDLPPTT TLSSSW---- DDDNNSNSSV GGSSSGGGGS SSSSNNNSGG SNGKGAGSV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -------DDQ Q-------WW WPPMASSSSG t5g66770.1 MAYMMCCCTD SNMAIAAQVI KKKKQQQQQQ QEQQQDDQQN NPPPSLNNWW WPP-SGGFFG t5g66770.2 MAYMMCCCTD SNMAIAAQVI KKKKQQQQQQ QEQQQDDQQN NPPPSLNNWW WPP-SGGFFG LAALLLLPPP V--------- ---------- ---------- DVVGY----- ---------- LGGFFFFPDP VGGGGGGDDS NNNDPPPPPN LLDHHHHHHH TGGGFRLSDG GGGGGEFEES LGGFFFFPPP VGGGGGGDDS NNNDPPPPPN LLDHHHHHHH TGGGFRLSDG GGGGGEFEES ---------- ---------- --------YD DPPPA--GD- ---------- --AFFAAAFP DWMEETIIGG GDDSSVVVAA ADDGGTWWHD DPPPDVVGDF DTPPLVPSSS DNSPPPPTWP DWMEETIIGG GDDSSVVVAA ADDGGTWWHD DPPPDVVGDF DTPPLVPSSS DNSPPPPTWP PPPCCCCCAD AAALL--RRR EEEEEEVVVA AAGR------ -------HLL MSSCAGIEEE PPPSSSSSSL SIPTTHETTT DPPEEETTTN NNDEDDDDDF FDDDDDLKAI YDDCA-ISSS PPPSSSSSSL SIPTTHETTT DPPEEETTTN NNDEDDDDDF FDDDDDLKAI YDDCA-ISSS GHAASAQQAD DSSSHAAVAA SSSIIRVVAF TTLLSRRRRL LFFPP-SVVV APPPPTEHAF SPAASKTTLQ QIIIRSELDP TTT--RVVAF TELLSNRRRL LSSPPNSAAA TSSSSSTEDL SPAASKTTLQ QIIIRSELDP TTT--RVVAF TELLSNRRRL LSSPPNSAAA TSSSSSTEDL L-YYACYLKK AAHFFTNNQA FHHGGGCCHH VIDSMQQQWP IQQLRRRRPP GGPPPLLRII ILYYACYSKK AAHLLTNNQA TEEKKKSSKK IVDGVQQQWP LQQTRRRRTT SGKKKIIRVV ILYYACYSKK AAHLLTNNQA TEEKKKSSKK IVDGVQQQWP LQQTRRRRTT SGKKKIIRVV TTGGPSSPGD DE--LLLVVG GLLLLLLAAR SVRVFFSSRR GGASDEEVRP WLQQQQIIAA SSGPASSLEP PESSLLLTTG GNNLLFFAAK VLDLFFDDII PPTPHLLLNG SFRRRRVVDV SSGPASSLEP PESSLLLTTG GNNLLFFAAK VLDLFFDDII PPTPHLLLNG SFRRRRVVDV VVAFNSLLQR RLGDPAQQPP -IIDVVVLCV APKFTEEEAN KTTTLLFFYY VVFDSSSSLD LLAVNFLLQK KL----EEPP TVVDAAALLA KPRVTEEEVN RVVVLLQQFY VVFESSSSLE LLAVNFLLQK KL----EEPP TVVDAAALLA KPRVTEEEVN RVVVLLQQFY VVFESSSSLE AAAGAAANNA AAAMEE-LLL QQREICDIVV CEGGAAA--- RRERELSSSR DLTRRAGLSS PNNDEEERRV VVVREERFFF GGRRISGLII GEKKTGGIII HREREKEEEQ VMENNAGFEE PNNDEEERRV VVVREERFFF GGRRISGLII GEKKTGGIII HREREKEEEQ VMENNAGFEE AAAVVPSNNR QMVGFSEEG- HHHGCCTTLL LWHHGGGRRR PFSSASSWEE EGDGGGNNNN SSSVVKNYYS QILWYNSSNL SSSGFFSSLL LWNNDDDLLL PLTTLSSWRR R--------- SSSVVKNYYS QILWYNSSNL SSSGFFSSLL LWNNDDDLLL PLTTLSSWRR R--------- NNNNNVGGGG GGGSDDDSSN NSGSNGSSSA RDGSSCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ---DPP---- ----WMDDPA t5g66770.1 MMMAAAMMDS SGLMMAAAQQ QQIIIIQKQQ QEEEQQQQQH HHHQGGPPSS LLLPW-NNTL t5g66770.2 MMMAAAMMDS SGLMMAAAQQ QQIIIIKKQQ QEEEQQQQQH HHHQGGPPSS LLLPW-NNTL AASGGLLDDA GFLLPPPPPA AVV------- ---------- ----AAPDVG Y--------- LLGGGLLSSG SAFFPPPPPF QVVTGGGDSS SPPFFNNLDD DHHHAATTGG FDFGGGGGFS LLGGGLLSSG SAFFPPPPPF QVVTGGGDSS SPPFFNNLDD DHHHAATTGG FDFGGGGGFS ---------- ------YPA- -----GGAD- ---------- ---------- -VVDAAAAAL SEELIGGSVD GDTTWWHPDY YYVVIGGPDP PDTYYSSRRL LSVQQPDNRV VIIDTTSSSP SEELIGGSVD GDTTWWHPDY YYVVIGGPDP PDTYYSSRRL LSVQQPDNRV VIIDTTSSSP LLFAAADDAA AA-AMRRREE EEEEEEG--- ---LLVVHHL LSCCAAAEEG DASQLAADDH PPPPTLLLPP PPHSPTTKEE EDDEEEDDDD FDELLLLKKA IDCCRRRSSS DNSTLLLQQR PPPPTLLLPP PPHSPTTKEE EDDEEEDDDD FDELLLLKKA IDCCRRRSSS DNSTLLLQQR HHAAVSAAAI GVVAVHHTTS RLPP---PPP TTTDDDAEF- YYHHHFFFYY PPKKHTTNQI RRESLGDDP- EVVAFYYTTS NLPPNNNPPS SSSSSSSTLL YYTTTLLLNN PPKKHTTNQI RRESLGDDP- EVVAFYYTTS NLPPNNNPPS SSSSSSSTLL YYTTTLLLNN PPKKHTTNQI LLEECHHHHV VDFSSMQLLL QQPLIQAARP GGGGPPLRRR IIIIIGGGPP PPTGGRE-DR LLEESKKKHI IDFGGVQIII QQPLLQAART GGGGPPIRRR VVVIIPPPPL LLGEESESAR LLEESKKKHI IDFGGVQIII QQPLLQAART GGGGPPIRRR VVVIIPPPPL LLGEESESAR LADLAASRRR RRFSSRRRGG AAAASSLERR RRWQQAPGGE AAVVAAFNSS SVHHHRLLGG LRDFAAVDNN NNFDDIIIPP LLLLPPILNN NNSRRDPDDE VVLLAAVNFF FMYYYKLL-- LRDFAAVDNN NNFDDIIIPP LLLLPPILNN NNSRRDPDDE VVLLAAVNFF FMYYYKLL-- GDDQAPP-ID ALLLVVVSSR PPPPKITTQN NNNTGLDDDR ALFYAVVVFS LDASAAAAAE -DDETPPTVD TLLLAAASSN PPPPRVTTYN NNNVGANNNR ALQYAVVVFS LENLEVVVVE -DDETPPTVD TLLLAAASSN PPPPRVTTYN NNNVGANNNR ALQYAVVVFS LENLEVVVVE --AAAAYYLR EECDICCEAA EEERREPSSS RRDDDRRRLL TGGGLLVPLL LGSSSNLQQQ RREEEELLFR RRSGLGGEGG EEERREEEEE RRVVVLLLMM EGGGFFVKLL LSNNNYVQQQ RREEEELLFR RRSGLGGEGG EEERREEEEE RRVVVLLLMM EGGGFFVKLL LSNNNYVQQQ AARRRRMLVG GLLFGGGGGE G--SVVVEEE EAAAAGGGLT GGHGGGPPSS ASSAAAAAAG AAKKKKILLW WNNYYYYYYS NLYIVVVESS SKKKKGGGIS AANDDDPPTT LSSSS----- AAKKKKILLW WNNYYYYYYS NLYIVVVESS SKKKKGGGIS AANDDDPPTT LSSSS----- GGGGDDNNNS NSVGSGGSSS SNSSSSNGKK ARRDGSVCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -MDDDFFQQQ ---WPMDDAS SGLLGFLPPP t5g66770.1 MAYYYYMCTS LLLAAIQIKQ QQEQQQHQQQ DHQQQFINNN LNPWP-NNLF FGLLSAFPDD t5g66770.2 MAYYYYMCTS LLLAAIQIKQ QQEQQQHQQQ DHQQQFINNN LNPWP-NNLF FGLLSAFPPP PPAV------ ---------- ---AADDDDV V--------- ---------- ---------- PPQVTGGGGS SDDFPFFFHH HHHAATTTTG GRLLSDDDFG TTTGGGEFFE EESDEEEISS PPQVTGGGGS SDDFPFFFHH HHHAATTTTG GRLLSDDDFG TTTGGGEFFE EESDEEEISS ---------- ---------Y DDDDPPAA-- ---GGGD--- ---------- ----DDPPAA GGGSSVVAAD DGGGPPDWWH DDDDPPDDYV IIIGGGDPFF DTTPLSSVQQ RVVVDDLLPP GGGSSVVAAD DGGGPPDWWH DDDDPPDDYV IIIGGGDPFF DTTPLSSVQQ RVVVDDLLPP AAFPPPPCAD DVL--AMMMR EEEEEI---- ---LLHLLSS CCAAAGIEAG GDHHALAAQL TLWPPPPSSL LLTHHSPPPT EDPPESDFDL EEPLLKAIDD CCAAA-ISDS SDPPNEKKTL TLWPPPPSSL LLTHHSPPPT EDPPESDFDL EEPLLKAIDD CCAAA-ISDS SDPPNEKKTL DDSAALVVVS AAAAASSIGG RVAAFFTTTL LLRRRLLLLF PPPPVVAAPT DAEEF----Y QQIEEVLLLG DDPPPTT-EE RVAAFFEEEL LLNRRLLLLS PPPPAATTSS SSTTLLLSSY QQIEEVLLLG DDPPPTT-EE RVAAFFEEEL LLNRRLLLLS PPPPAATTSS SSTTLLLSSY HECCCPYYLF HFFFTTTAAN NILAAAFHCD VHHVDDFLLQ LLWWLIIIQQ QAALLALRPG TDCCCPYYSF HLLLTTTAAN NILAAATESN IHHIDDFIIQ IIWWLLLLQQ QAALLATRTS TDCCCPYYSF HLLLTTTAAN NILAAATESN IHHIDDFIIQ IIWWLLLLQQ QAALLATRTS PPPFFFFLRI GIGGGPPPPP TDRDDVLADD SSVRVVRFFG GGANNSSSLD DEVRRRLLQA KPPTTTTIRV GIPPPPPPPL GPIAATLRDD VVLDLLNFFP PPL--PPPIH HLLNNNFFRD KPPTTTTIRV GIPPPPPPPL GPIAATLRDD VVLDLLNFFP PPL--PPPIH HLLNNNFFRD AGGEAAAFNS VLHLLLDPAA DQAA---IIV LDDCVVAVVP KIIVEEADHH NKTFFDDFTA DDDEVAAVNF MLYLLL---- DETTTIIVVA LRRLAAKLLP RVVLEEVSLL NRVFFNNVKA DDDEVAAVNF MLYLLL---- DETTTIIVVA LRRLAAKLLP RVVLEEVSLL NRVFFNNVKA LLLLFYYDDA AAGAGNAM-- -AAAQQQQRR EICCCGEEAA RRHLLSSSRR RRRDRLLGGL LLLLQYYEEN NGSEERVRRR REEEGGGGRR RISGGPEETT RRMKKEEEQQ RRRVLMMGGF LLLLQYYEEN NGSEERVRRR REEEGGGGRR RISGGPEETT RRMKKEEEQQ RRRVLMMGGF LLSSAVVVVP LLLNLLAARM MLLVGLGG-H HEADGLTTGW GGGPPLLFSA SSSAEGDDGN FFEESVVVVK LLLYVVAAKI ILLLWNYNLS SEKPGISSAW DDDPPLLLTL SSSSR----- FFEESVVVVK LLLYVVAAKI ILLLWNYNLS SEKPGISSAW DDDPPLLLTL SSSSR----- NNSSSNNSNN VGSGSDDSNS SSSGGKSSSR RDDDDSVVC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -------MDF PFQQ------ ---WWPDDDP t5g66770.1 MMMTSSGGNL LLLMMMAQQQ QVVKKQQQQQ QQHHHQDHQF GINNPPLSNN NPPWWPNNNT t5g66770.2 MMMTSSGGNL LLLMMMAQQQ QVVKKKQQQQ QQHHHQDHQF GINNPPLSNN NPPWWPNNNT AAAASSGGLL PAAV------ ---------- -----AAPDG GGV------- ---------- SLLLGFSSFF PQQVGDSSSN NGFPPFFFPD HHHHHAATTG GGGRLSSFFG GGGGGEESSE SLLLGFSSFF PQQVGDSSSN NGFPPFFFPD HHHHHAATTG GGGRLSSFFG GGGGGEESSE ---------- ---------- ------YYYP A----AAD-- ---------- ---------- MMEEETTTTL LGGSSVADGG DCDDTTHHHP DVVVYPPDPP PFFFTTSSSR RSQPSSDDLR MMEEETTTTL LGGSSVADGG DCDDTTHHHP DVVVYPPDPP PFFFTTSSSR RSQPSSDDLR VDAALLEEEP PCAADAAAAV LLL---AREE EEEEVAGIIR R----LVHHL LSSAAAGIEE IDTSPPPPPP PSSSLIIPPL TTTHEESKED DPEETNDSSE EDDEPLLKKA IDDAAA-ISS IDTSPPPPPP PSSSLIIPPL TTTHEESKED DPEETNDSSE EDDEPLLKKA IDDAAA-ISS AAAGGHSSAL DDAAVAAASG IGGVAAAVVV HHHTTALSRR RLLF---SSS VAAPPTTTTD DDDSSPSSKL QQEELDPPT- -EEVAAAFFF YYYTEALSNR RLLSNNNSSS ATTSSSSSSS DDDSSPSSKL QQEELDPPT- -EEVAAAFFF YYYTEALSNR RLLSNNNSSS ATTSSSSSSS DAAAHHAL-Y HEEACCYLLL KFAFFAAQAA AIIEFHGDDH HVHVVIIDFM MQWPALLIIA SSSSEEDILY TDDACCYSSS KFALLAAQAA AIIETEKNNK KIHIIVVDFV VQWPALLLLA SSSSEEDILY TDDACCYSSS KFALLAAQAA AIIETEKNNK KIHIIVVDFV VQWPALLLLA AALPPPPPF- LRRIIITGII GGPPSSSTTE EEEDDVVVGR LAAAASSVRF SSSFFAAASL AATTTKPPTQ IRRVVVSGII PPAPSSSGGE EEEAATTTGR LAAAAVVLNF DDDFFLLTPI AATTTKPPTQ IRRVVVSGII PPAPSSSGGE EEEAATTTGR LAAAAVVLNF DDDFFLLTPI DEEEVVRRPW WGEAVANSSS VVQQHRLLLL LLDPQQQQQP -IVVVVVLLD DVSVRPPKKF HLLLLLNNGS SDEVLANFFF MMQQYKLLLL LL--EEEEEP TVAAAAALLR RASLNPPRRV HLLLLLNNGS SDEVLANFFF MMQQYKLLLL LL--EEEEEP TVAAAAALLR RASLNPPRRV TVVVEQADKT FLREALYYSS AAFSLLDDDA AANAAMAEEE AYYYYICIIC GGEGAA---R TLLLEYVSRV FARNALFFSS AAFSLLEEEP NERVVRVEEE ELLLLISLLG PPEKTGIIIH TLLLEYVSRV FARNALFFSS AAFSLLEEEP NERVVRVEEE ELLLLISLLG PPEKTGIIIH EERHLRWDTA GSSVVPGSAL RRRRAALVFS GGGEGGG--- --HHSEEEAD GGCCLWWHRL EERMKQWVEA GEEVVKSNAV SSSSAALLYN YYYSNNNLLL YYSSISSSKP GGFFLWWNLL EERMKQWVEA GEEVVKSNAV SSSSAALLYN YYYSNNNLLL YYSSISSSKP GGFFLWWNLL AAWEAAGGGG GDNNSVSSSS DDNGSNGKSS GDGSCCLLL LLWR------ ---------- ---------- --------- LLWR------ ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ------MDTT Q----PDASG t5g66770.1 YCTGGGNNNN LMAAAIIAAQ QQQQQVVVKQ QQQQQQEQQQ QQHHQDHQII NPLLNPNSFG t5g66770.2 YCTGGGNNNN LMAAAIIAAQ QQQQQVVVKK QQQQQQEQQQ QQHHQDHQII NPLLNPNSFG DDGGFFLPPP PPPVV----- ---------- ----APDDDV Y--------- ---------- SSSSAAFPPP DDPVVTTTGG GGGNFPPFFF PLDHATTTTG FRLLLFFFGG GGTTGEFELL SSSSAAFPPP PPPVVTTTGG GGGNFPPFFF PLDHATTTTG FRLLLFFFGG GGTTGEFELL ---------- ---------- -YDDD----G GDDD------ ---------- ---------- IIIIISGGDD VAADGPDDDT WHDDDVVIYG GDDDPPPPSS RRRLLLVQQP PSSDDDLLNR IIIIISGGDD VAADGPDDDT WHDDDVVIYG GDDDPPPPSS RRRLLLVQQP PSSDDDLLNR DDDAALFAAA AAAFPPPPCC CAAPPAAAVV VLLL----AM REEEEEEVVG GGII------ DDDTSPPPPP TLLWPPPPSS SSSPPSIILL LTTTHHEESP KEEDDPETTD DDSSDDFFDD DDDTSPPPPP TLLWPPPPSS SSSPPSIILL LTTTHHEESP KEEDDPETTD DDSSDDFFDD -----LVLLM CCCAAAEDAA AASAALADHA AALAVVSSAA AAGGGRVAAA LSSRLFPPP- LEEEPLLAIY CCCAAASDNN NNSKKLLQRE ESVELLGGDD DP--ERFAAA LSSRLSPPPN LEEEPLLAIY CCCAAASDNN NNSKKLLQRE ESVELLGGDD DP--ERFAAA LSSRLSPPPN -SSSAATEAA FFFL--YHHH HHHHEAPLKF FAFAQILEEH CCDVVHDFFS LLLGGGLLQQ NSSSTTSTDD LLLILLYTTT TTTTDAPSKF FALAQILEEE SSNIIHDFFG IIIGGGIIQQ NSSSTTSTDD LLLILLYTTT TTTTDAPSKF FALAQILEEE SSNIIHDFFG IIIGGGIIQQ QPPAAAQALA ALPGGPPF-L RRIGIGTRDE ERGLLAARSS SVRVRSSFVA ASLLLDDEVP QPPAAAQALA ATTSGKKTQI RRVGIPGSPE EIGLLAAKVV VLDLNDDFIL TPIIIHHLLG QPPAAAQALA ATTSGKKTQI RRVGIPGSPE EIGLLAAKVV VLDLNDDFIL TPIIIHHLLG WWMLAPGGEA AFFVLLLQLH HRLLDDDDDP AADDP--IAV LCAVVRPTVV IQQQEEEAAH SSSFDPDDEV AVVMLLLQLY YKLL------ --DDPTIVTA LLKLLNPTLL GYYYEEEVVL SSSFDPDDEV AVVMLLLQLY YKLL------ --DDPTIVTA LLKLLNPTLL GYYYEEEVVL HNTTGDDDDR RFTTEAFYYA FFDDSSDAAG GAGGGAEAAY QRIIIIIICG AARRREEHHE LNVVGNNNNR RVKKNAQFYA FFEESSEPPS SEEEEVEEEL GRIIIILLGK TGHRREEMME LNVVGNNNNR RVKKNAQFYA FFEESSEPPS SEEEEVEEEL GRIIIILLGK TGHRREEMME PLWRRRDRRL AGLAVVLGGA AQARRLGLFS SEEGG---SS VEEEAAADDL LLLWGLFSSE EKWRRRVLLM AGFSVVLSSA AQAKKLWNYN NSSNNLLYII VESSKKKPPI LLLWDLLTSR EKWRRRVLLM AGFSVVLSSA AQAKKLWNYN NSSNNLLYII VESSKKKPPI LLLWDLLTSR AADGGGDNNN NNSGDNNNSG KSSSSARRRD DDGGGSSVL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ----MDPFFQ QQ-----DDA t5g66770.1 MMMYMMMCCT TSSNNLMIIA QQQVVIIIKQ KKKQQEEQQQ HHHHHQGIIN NNPLLNPNNS t5g66770.2 MMMYMMMCCT TSSNNLMIIA QQQVVIIIKK KKKQQEEQQQ HHHHHQGIIN NNPLLNPNNS AAGLAGGFFL PVVV------ ---------- ----APDDDV GYY------- ---------- SLGLGSSAAF PVVVTGGGGG SSSDDDPFFF HHHHATTTTG GFFRLLSSGG TTTGGGEEMT SLGLGSSAAF PVVVTGGGGG SSSDDDPFFF HHHHATTTTG GFFRLLSSGG TTTGGGEEMT ---------- ---------Y PPAAA----- GGAD------ ---------- ------VAAA SGGGGSSSVV VVDDDTTWWH PPDDDVIYYY GGPDPPFFFD DTYYPRLLSV VSDLLVITTT SGGGGSSSVV VVDDDTTWWH PPDDDVIYYY GGPDPPFFFD DTYYPRLLSV VSDLLVITTT ALLFAAAAPP PPPPCCCADA AAAVVAREEE EEEEGGGR-- ---------- --LLLLHHSA SPPPPTLLPP PPPPSSSSLI PPPLLSTEEE DDPPDDDEDD DDDFFFDLLL PPLLLLKKDA SPPPPTLLPP PPPPSSSSLI PPPLLSTEEE DDPPDDDEDD DDDFFFDLLL PPLLLLKKDA AGAIAAGDLS QLSHAALAAV VGGRRAHFTL SPP--PAAPP PPPPTTAHFF FFF----YYH A-RIDDSDES TLIREEVSSL L-ERRAYFTL SPPNNPTTSS SSSSSSSELL LLLLLSSYYT A-RIDDSDES TLIREEVSSL L-ERRAYFTL SPPNNPTTSS SSSSSSSELL LLLLLSSYYT HFFFYYECCY YKFHNNNQIL LAHHGGGGGC DDHHHHHHVV IFSSMMLQQP LIQAALLALL TLLLNNDCCY YKFHNNNQIL LAEEKKKKKS NNKKKKKHII VFGGVVIQQP LLQAALLATT TLLLNNDCCY YKFHNNNQIL LAEEKKKKKS NNKKKKKHII VFGGVVIQQP LLQAALLATT LLPGPPFF-L RRIITGIIII PPPSTRDE-- ---RRDVVVG LLLLLRRRLA LAAARRRVVV TTTGKPTTQI RRVVSGIIII AAPSGSPEPP PSSIIATTTG NNNNNRRRLR FAAAKKKLLL TTTGKPTTQI RRVVSGIIII AAPSGSPEPP PSSIIATTTG NNNNNRRRLR FAAAKKKLLL VRFFFRRRAL LLPWLIAAPE EEEAVNNNVQ RRPPPDDQPA VLDAAVRIIF IEEQEEDDHN LNFFFIIILI IIGSFVDDPE EEEVLNNNMQ KK---DDEPT ALRKKLNVVV GEEYEESSLN LNFFFIIILI IIGSFVDDPE EEEVLNNNMQ KK---DDEPT ALRKKLNVVV GEEYEESSLN TGLLLLDDRF TEEALFYYYA AVVFLDAAAA ASGGGAGGAA MAAEE-ALLL QRRREVCEEG VGAAAANNRV KNNALQFFYA AVVFLEPNGG GRSSSEEEVV RVVEEREFFF GRRRRIGEEK VGAAAANNRV KNNALQFFYA AVVFLEPNGG GRSSSEEEVV RVVEEREFFF GRRRRIGEEK ARERHHHELW RRDRLRAAAL PPPLNAQAML VVVLFSG--- -VEEEEDDDG LLLLLLHGSA TRERMMMEKW RRVLMNAAAF KKKLYAQAIL LLLNYNYLLY YVEESSPPPG IIIILLNDTL TRERMMMEKW RRVLMNAAAF KKKLYAQAIL LLLNYNYLLY YVEESSPPPG IIIILLNDTL GDNNNNNNVV SSSSGSDSSN NSGGGNNKSS GADGSSVCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- DTFF------ ------WWPP MDASSGLLLD t5g66770.1 YMCCTTTNNL MMAAQQQKQQ QQQQQHHDDD QIFIPPPLLL LSSLNNWWPP -NLGGGLLLS t5g66770.2 YMCCTTTNNL MMAAQQQKQQ QQQQQHHDDD QIFIPPPLLL LSSLNNWWPP -NLGGGLLLS AAAGLPPPAA VV-------- ---------A ADDGGGY--- ---------- ---------- GGGSFPPDFQ VVGGDSSSPG PPPFFPLHHA ATTGGGFRSF FFGGGGGTGG EFSDEWWMME GGGSFPPPFQ VVGGDSSSPG PPPFFPLHHA ATTGGGFRSF FFGGGGGTGG EFSDEWWMME ---------- -----PPA-- ----D----- ---------- ---------D DAAALLLLPF ELLGGDDDGD CDDDTNNDYY YVYYDPFFDD DTTYYYPPSS SVQQPDLNRD DTTSPPPPLP ELLGGDDDGD CDDDTNNDYY YVYYDPFFDD DTTYYYPPSS SVQQPDLNRD DTTSPPPPLP AAFFAAPDDA AAAAAV-EEE EEEEVVGG-- -------VLL LLLMMAAGAA IAAAALLSAA PTWWSSPLLS SSPPPLEEEP PEEETTDDDD DDFDDLPLAA IIIYYAA-RR IDDNNEESKK PTWWSSPLLS SSPPPLEEEP PEEETTDDDD DDFDDLPLAA IIIYYAA-RR IDDNNEESKK AQQQAAAAAA ALAVVVSSSA SGGIIGAAFF TTTTTALSSS RRLPSVVPPT THHFHHHYEA KTTTLLLESS SVELLLGGGP T----EAAFF TTEEEALSSS NRLPSAASSS SEELKKTNDA KTTTLLLESS SVELLLGGGP T----EAAFF TTEEEALSSS NRLPSAASSS SEELKKTNDA ACYLAAAHFA AANNNNQIIL EECDHVIDFM QLLQWPPIIA LLAAARPPPP PGGGPPPFFL ACYSAAAHLA AANNNNQIIL EESNHIVDFV QIIQWPPLLA LLAAARTTTT TSSGKPPTTI ACYSAAAHLA AANNNNQIIL EESNHIVDFV QIIQWPPLLA LLAAARTTTT TSSGKPPTTI ITGSSPGDDE -LRVVVLLRD DLAAASSRFF FFRGGVVAAA ANSSSERPLQ IIPEANVLLQ VSGSSLEPPE PLITTTNNRD DFAAAVVDFF FFIPPIILLT T-PPPLNGFR VVPEANMLLQ VSGSSLEPPE PLITTTNNRD DFAAAVVDFF FFIPPIILLT T-PPPLNGFR VVPEANMLLQ QQLLHRRRLL LGDDPP--II IAALDDCVAA SSSPPKIFVI IIIEEQEDDD DHNNKFLDDD QQLLYKKKLL L-DDPPTIVV VTTLRRLAKK SSSPPRVVLG GGGEEYESSS SLNNRFANNN QQLLYKKKLL L-DDPPTIVV VTTLRRLAKK SSSPPRVVLG GGGEEYESSS SLNNRFANNN DTTYYSAVVF DSSLLDDDDA ASSAGGGGGG NMAAE--LRR EEEIICIIGE AA-REEEEHH NKKFYSAVVF ESSLLEEEEP NLLGDSSSEE RRVVERRFRR RRRIISLLPE TTIHEEEEMM NKKFYSAVVF ESSLLEEEEP NLLGDSSSEE RRVVERRFRR RRRIISLLPE TTIHEEEEMM ELSRRRDDDD LLTRASAGNL RRMMLVGLSG EG-SVEEDDD GGCLLTTWHH HRPLFFSAWW EKEQRRVVVV MMENAESSYV SKIILLWNNY SNLIVESPPP GGFIISSWNN NLPLLLTSWW EKEQRRVVVV MMENAESSYV SKIILLWNNY SNLIVESPPP GGFIISSWNN NLPLLLTSWW EAAAGDDDGG GDDDNNNNNS NSNVSGGSDN SNGSGAGLL R--------- ---------- ---------- --------- R--------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -----MMFFF QQ-------- t5g66770.1 AAMMTDDDDS SGGGNNLLII AAQQVKQQEE QQQQQQQQQQ HHHHDHHIII NNLLSSLLNN t5g66770.2 AAMMTDDDDS SGGGNNLLII AAQQVKQQEE QQQQQQQQQQ HHHHDHHIII NNLLSSLLNN -WMMDDAAAL DAAAFFFLPA V--------- ---------A DDDGGV---- ---------- PW--NNSLLL SGGGAAAFPQ VGSNPPGGFF PPNNNLHHHA TTTGGGRRSS DGGGGETLII PW--NNSLLL SGGGAAAFPQ VGSNPPGGFF PPNNNLHHHA TTTGGGRRSS DGGGGETLII ---------- --DPPPPP-- ----A----- ---------- ---------V ALLPEFFFAA SSGAAGGGGP CTDNNNPPYY VVIYPPFYYP SSSLLLVVVV VQPPDLRRRI SPPLPPPPPP SSGAAGGGGP CTDNNNPPYY VVIYPPFYYP SSSLLLVVVV VQPPDLRRRI SPPLPPPPPP APPPPPPAPA AAAAAAAAL- REEEAAII-- ----LLSCAA IEAGGHHHHA ALLSSSLSHA PPPPPPPSPS SPPPPPPPTE KEEENNSSDD DPPPAIDCAR ISDSSPPPPN NEESSSLIRE PPPPPPPSPS SPPPPPPPTE KEEENNSSDD DPPPAIDCAR ISDSSPPPPN NEESSSLIRE AAAASAASSG IGGGRVVVVV HHFTTLLSPS SPVVVAAPPT TTAAHFFL-- -----YYHHH SSSEGDPTT- -EEERVFFFF YYFTTLLSPS SPAAATTSSS SSSSELLILL LSSSSYYKTT SSSEGDPTT- -EEERVFFFF YYFTTLLSPS SPAAATTSSS SSSSELLILL LSSSSYYKTT YEEPYLKKKA ATQEFFCCCD HVIDDFSSLL LMMMQGPPPA LQLLAAALPG PP--RTTIGG NDDPYSKKKA ATQETTSSSN KIVDDFGGII IVVVQGPPPA LQLLAAATTG KPQQRSSIPP NDDPYSKKKA ATQETTSSSN KIVDDFGGII IVVVQGPPPA LQLLAAATTG KPQQRSSIPP GSPTTGRR-- LLVVVVGGLR RDAVVVVRRF FRRGGGVAAA NLLVVRRPPP PWMMQPANSL PSLGGESSPS LLTTTTGGNR RDALLLLNNF FIIPPPILTT -IILLNNGGG GSSSRPVNFL PSLGGESSPS LLTTTTGGNR RDALLLLNNF FIIPPPILTT -IILLNNGGG GSSSRPVNFL LQQQLHRRRR RRLLGDPAQA IAAVLCCVVS SVRKKIITVI IIEEQADHHH HTFLLDDFTE LQQQLYKKKK KKLL----ET VTTALLLAAS SLNRRVVTLG GGEEYVSLLL LVFAANNVKN LQQQLYKKKK KKLL----ET VTTALLLAAS SLNRRVVTLG GGEEYVSLLL LVFAANNVKN FFYSAVDDDS AASSAAAGME -AQQRCCCIV GERERRRHPP RWDDDRTAGL LSSAVPLLLG QQFSAVEEES PNLLGEEERE REGGRSSSLI PEHERRRMEE QWVVVLEAGF FEESVKLLLS QQFSAVEEES PNLLGEEERE REGGRSSSLI PEHERRRMEE QWVVVLEAGF FEESVKLLLS SSSAALLLLR RRRRQARMVL LLGG----HS SEEECLTLGH HGRRFSAWEA ADDDDGNSNS NNNAAVVVVS SSSSQAKILN NNYYLLYYSI IEESFISLAN NDLLLTLWR- ---------- NNNAAVVVVS SSSSQAKILN NNYYLLYYSI IEESFISLAN NDLLLTLWR- ---------- NVSSSGGSSS SDSSSNNGSS SGGGSADDGG SSSSVCLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- MMMDDFFFQQ ---------- t5g66770.1 MAAYYCTDDS GGNMAIQIKK QQQQQEEQQQ QQQQQHHHQD HHHQQFFINN LLSLLNNPPP t5g66770.2 MAAYYCTDDS GGNMAIQIKK KKQQQEEQQQ QQQQQHHHQD HHHQQFFINN LLSLLNNPPP WPMMASSSGG FLLPPPAAAV ---------- ---------- APPPDDDVVV GY-------- WP--SGGFGS AFFPDPFFQV TGGGGGNFPF FPPPNDHHHH ATTTTTTGGG GFRDDFGGGT WP--SGGFGS AFFPPPFFQV TGGGGGNFPF FPPPNDHHHH ATTTTTTGGG GFRDDFGGGT ---------- ---------- ---------D PPPPA---G- ---------- ---------- TTTGFEEEWE ELIISSGGDA ADDGGPDDTD NNNPDYIIGP PFDTPSRRQP SSSSSDLRRV TTTGFEEEWE ELIISSGGDA ADDGGPDDTD NNNPDYIIGP PFDTPSRRQP SSSSSDLRRV DDAALLPPEF AFPCAPAAV- -REEEVVVAG RR-------- -----LHHLL LLMSCAAGGI DDSSPPLLPP TWPSSPSPLH HTDPETTTND EEDDDDDFDL LEPPPLKKAA IIYDCAA--I DDSSPPLLPP TWPSSPSPLH HTDPETTTND EEDDDDDFDL LEPPPLKKAA IIYDCAA--I DHHAAQAALA AVAAAASGRR VAAVVFFTLL RRRRRPPPP- PPVVVVAPPP PPPPTAAEFL DPPNKTEEVS ELDPPPT-RR VAAFFFFTLL NNNNNPPPPN PPAAAATSSS SSSSSSSTLI DPPNKTEEVS ELDPPPT-RR VAAFFFFTLL NNNNNPPPPN PPAAAATSSS SSSSSSSTLI ---YHFFEAC CYLAAFTTNQ AILEAFFFGD VHHVIDSSLL LMMQWPLIAA LGPPF-TGGI LLLYKLLDAC CYSAALTTNQ AILEATTTKN IHHIVDGGII IVVQWPLLAA TGKPTQSGGI LLLYKLLDAC CYSAALTTNQ AILEATTTKN IHHIVDGGII IVVQWPLLAA TGKPTQSGGI GGPPPPTEEE ELRDDDGRRD DRRSSSVVRR SVVASLLDRP WMLLQIVAFF FNNNSVVVVL PPAPLLGEEE ELIAAAGRRD DKKVVVLLDN DIILPIIHNG SSFFRVLAVV VNNNFMMMML PPAPLLGEEE ELIAAAGRRD DKKVVVLLDN DIILPIIHNG SSFFRVLAVV VNNNFMMMML HRLLLLGGDP AADDAAPP-- IIDAVLDVVS SRPPKIIIFF TTQQEAKTFF FLDRTTTTTE YKLLLL---- --DDTTPPTI VVDTALRAAS SNPPRVVVVV TTYYEVRVFF FANRKKKKKN YKLLLL---- --DDTTPPTI VVDTALRAAS SNPPRVVVVV TTYYEVRVFF FANRKKKKKN AAAYYSSSFF DSAASSSAAA SSGAMMMAYY QRICCIVVVC GAARRREHPP LLWRRDLRAA AAAFFSSSFF ESPNLLLGGG RREVRRRVLL GRISSLIIIG PTGHHREMEE KKWRRVMNAA AAAFFSSSFF ESPNLLLGGG RREVRRRVLL GRISSLIIIG PTGHHREMEE KKWRRVMNAA LLAVPLSSAA LRRQQAAMML LGLFSSSGGG G--VVEAACC CLTLWWRRPL FFAWWAAAGD FFSVKLNNAA VSSQQAAIIL LWNYNNNNNN NYYVVSKKFF FISLWWLLPL LLLWW----- FFSVKLNNAA VSSQQAAIIL LWNYNNNNNN NYYVVSKKFF FISLWWLLPL LLLWW----- DGGGNNNSSV VVGSGGDDDS SNGSNKKGAR RRDSSSVVC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- MTTFFFFQQ- -----MAASS t5g66770.1 AAAAMCTTSL LLMAQQQQQQ QVIQQQQQQQ QEQQQHHQDD HIIFFFINNL LLLNP-SSGG t5g66770.2 AAAAMCTTSL LLMAQQQQQQ QVIKQQQQQQ QEQQQHHQDD HIIFFFINNL LLLNP-SSGG GDAFFLLPPP AAAV------ ---------- --AAAPDVGY YY-------- ---------- GSGAAFFDDD FQQVSNNDDP PPGGGPPFLL DHAAATTGGF FFLSDFFGGG GFEWEETTGD GSGAAFFPPP FQQVSNNDDP PPGGGPPFLL DHAAATTGGF FFLSDFFGGG GFEWEETTGD ---------- --------DP PPAA------ AAAD------ ---------- VVVDAAAAFP SVVAAAADDD DGPPCDTTDN NPDDYVVVIY PPPDPFTTYY SRRRSSSSDR IIIDTTSTWP SVVAAAADDD DGPPCDTTDN NPDDYVVVIY PPPDPFTTYY SRRRSSSSDR IIIDTTSTWP PPCCCCADAA V-----MRRR REEEEEEEVR ---------- ----VHMMSC CAAAIEEEAH PPSSSSSLSI LHHHEEPTKK KEDDPPEETE DDDDFFDDLL LPPPLKYYDC CAARISSSDP PPSSSSSLSI LHHHEEPTKK KEDDPPEETE DDDDFFDDLL LPPPLKYYDC CAARISSSDP HALASSSAQL SAALLAAASS AAAAGRVVHT TALLSSRLF- SSPAAPTDHL -----YHHHF PNEASSSKTL IESVVSSEGG DDDPERVVYT TALLSSNLSN SSPTTSSSEI SSSSSYKTTL PNEASSSKTL IESVVSSEGG DDDPERVVYT TALLSSNLSN SSPTTSSSEI SSSSSYKTTL YEEAACCPYL LKFANLEAFF HGGGGDHHVI DFLMMMMGLQ QWPAIQQAAL LAPPF----R NDDAACCPYS SKLANLEATT EKKKKNKHIV DFIVVVVGIQ QWPALQQAAL LAPPTQQQQR NDDAACCPYS SKLANLEATT EKKKKNKHIV DFIVVVVGIQ QWPALQQAAL LAPPTQQQQR RIITTGPSTG GGGR-LDVGL AALLARSRRV SRGGVAPWLL LLQQIIAAPG AAFSSLQLRR RVVSSPPSGE EEESPLATGN RRFFAKVDDL DIPPILGSFF FFRRVVDDPD AAVFFLQLKK RVVSSPPSGE EEESPLATGN RRFFAKVDDL DIPPILGSFF FFRRVVDDPD AAVFFLQLKK LLDPDDQA-I AADCCCVVVA SSVPKTIEEE EEAHHNNTTT GFLDDFEEAL LYYAFFSLDD LL--DDETIV TTRLLLAAAK SSLPRTGEEE EEVLLNNVVV GFANNVNNAL LFYAFFSLEE LL--DDETIV TTRLLLAAAK SSLPRTGEEE EEVLLNNVVV GFANNVNNAL LFYAFFSLEE ASSAAGGGGG NAAMAEYYYR RREIICIVEA AAA--RRRRR ERHHEELRRW DRRRLTRALA PLLGGDSSEE RVVRVELLLR RRRIISLIET TTGIIHHRRR ERMMEEKQQW VLLLMENAFS PLLGGDSSEE RVVRVELLLR RRRIISLIET TTGIIHHRRR ERMMEEKQQW VLLLMENAFS VLLLLGGALQ QAARRMMLVV GGGLSGG--S VEAACCLLLL LGHHGGGLFS SAEAAGDDDG VLLLLSSAVQ QAAKKIILLL WWWNNYYYYI VSKKFFLLLL LANNDDDLLT TLR------- VLLLLSSAVQ QAAKKIILLL WWWNNYYYYI VSKKFFLLLL LANNDDDLLT TLR------- GGNNNNSSSS NNNNSVGSSN NNNGSSSSSS NGSSGDSVV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ----MDTPQ- ----WWMMDD t5g66770.1 MMMYMMMMTT TSMMMAIIAQ QQQIKKQQQQ QEEQQQQQQQ QHQQHQIGNP SLNPWW--NN t5g66770.2 MMMYMMMMTT TSMMMAIIAQ QQQIKKKQQQ QEEQQQQQQQ QHQQHQIGNP SLNPWW--NN PASSSDAGPP PVV------- ---------- --AAAPDDV- ---------- ---------- TSGFFSGSPD PVVTGGDDSS NNDDDPFFPN LDAAATTTGR RLSFGGGTGG GGESSSDDEE TSGFFSGSPP PVVTGGDDSS NNDDDPFFPN LDAAATTTGR RLSFGGGTGG GGESSSDDEE ---------- ---------- ---YDDPPA- ---A------ ---------- ---------- WMMELLISSG DSVAGPPPDD DWWHDDNPDY VVYPPPPFFD DTYPRVVPPS SDDLLLNNRV WMMELLISSG DSVAGPPPDD DWWHDDNPDY VVYPPPPFFD DTYPRVVPPS SDDLLLNNRV VDDAAALLLP FAAAAAPPPC PDAA----MR EEEEEAGGIR -----LVLCC AGGGIEAAAA IDDTSSPPPL PPPPLLPPPS PLPPHHHEPT EDPPENDDSE DDDPPLLACC A---ISDDDD IDDTSSPPPL PPPPLLPPPS PLPPHHHEPT EDPPENDDSE DDDPPLLACC A---ISDDDD DHHALLASAQ ADDSAAAAAV AAGIIGRVVV ATTTASSLFF FF--VVVPTT DAAAHHHFFF DPPNEEASKT LQQIESSSSL DP---ERVVV ATTTASSLSS SSNNAAASSS SSSSEEELLL DPPNEEASKT LQQIESSSSL DP---ERVVV ATTTASSLSS SSNNAAASSS SSSSEEELLL LL-HHHAACY KFTANNAIEE EAAHGGCCHH HVIIDFMMQG WPAIQQALAA LPPFLRTGII IILKKTAACY KFTANNAIEE EAAEKKSSKH HIVVDFVVQG WPALQQALAA TPPTIRSGII IILKKTAACY KFTANNAIEE EAAEKKSSKH HIVVDFVVQG WPALQQALAA TPPTIRSGII GPPSSPGRRR E-LRDDDDVV VGLRLLAAAL AARRVVVRRF FFGSSSLDDP PWMMLQQAGG PAASSLESSS EPLIAAAATT TGNRLLRRRF AAKKLLLNNF FFPPPPIHHG GSSSFRRDDD PAASSLESSS EPLIAAAATT TGNRLLRRRF AAKKLLLNNF FFPPPPIHHG GSSSFRRDDD EANSQRLLLG DPAADDDAAP P--ADCVVVA SSKFTIIEEQ QDDDNNKTTL DFTTEAAALF EVNFQKLLL- ----DDDTTP PTITRLAAAK SSRVTGGEEY YSSSNNRVVA NVKKNAAALQ EVNFQKLLL- ----DDDTTP PTITRLAAAK SSRVTGGEEY YSSSNNRVVA NVKKNAAALQ YSAVLDAASA AAMMMAEEEA AYRRRIIDII VVVGEGA-RR EEHHHPRRWL AAAAGGLLLA FSAVLEPPRE EVRRRVEEEE ELRRRIIGLL IIIPEKGIHR EEMMMEQQWM AAAAGGFFFS FSAVLEPPRE EVRRRVEEEE ELRRRIIGLL IIIPEKGIHR EEMMMEQQWM AAAAGGFFFS APLLLSAARQ AAVGLLFFSG GEGG--HSSV EECCCTLWPF SSSAWWAAAA GDGGNNNNNN SKLLLNAASQ AALWNNYYNY YSNNLYSIIV SSFFFSLWPL TTSSWW---- ---------- SKLLLNAASQ AALWNNYYNY YSNNLYSIIV SSFFFSLWPL TTSSWW---- ---------- SSSSNSNVVV VSSGSDSSSN NNNGGSNNGK SSSGAGGSC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---FPF---- PPSDAAGGFP PAAV------ t5g66770.1 MMMAAYMCDD MMAAAAAQQQ IQQQEEEQQQ HQQFGILLSN PTFSGGSSAP DQQVTGGGDS t5g66770.2 MMMAAYMCDD MMAAAAAQQQ IQQQEEEQQQ HQQFGILLSN PTFSGGSSAP PQQVTGGGDS ---------A DDDGVG---- ---------- ---------- ---------- ---YDDA--D NNGPPFHHHA TTTGGGRRDF GGGTTTGGGG GSSSDWWWSV VGGPPDDCDD WWWHDDDYYD NNGPPFHHHA TTTGGGRRDF GGGTTTGGGG GSSSDWWWSV VGGPPDDCDD WWWHDDDYYD ---------- -------DDA AEFFAAAAAF PPPCAPPDAA AAVL---MME EEVVVAGG-- DPSSRRLSVP PPDLLNNDDS SPPPPPTTLW PPPSSPPLSI PPLTHHEPPE EPTTTNDDDD DPSSRRLSVP PPDLLNNDDS SPPPPPTTLW PPPSSPPLSI PPLTHHEPPE EPTTTNDDDD --------LL HHLLMCGAII EEGHALAQLA DDHAALAAVA SSSRVAVHTT TTTTTALSRR DFEEPPPPLL KKAAYC-RII SSSPNEATLL QQRSSVSELP TTTRVAFYTT EEEEEALSRR DFEEPPPPLL KKAAYC-RII SSSPNEATLL QQRSSVSELP TTTRVAFYTT EEEEEALSRR FPSSSSSPVA AAPPTTTTDE HHAAFFL--H HHHHYEEEAC PLLLKKKFFF AAHHFTAQAI SPSSSSSPAT TTSSSSSSST EEDDLLILLK KKTTNDDDAC PSSSKKKFFF AAHHLTAQAI SPSSSSSPAT TTSSSSSSST EEDDLLILLK KKTTNDDDAC PSSSKKKFFF AAHHLTAQAI LAFDHHVVVH HHIILLMMWL IIQALLRPGG GGPF-LLGGI IGPPSPTTTG RRDEE----R LATNKKIIIH HHVVIIVVWL LLQATTRTSG GGPTQIIGGI IPAPSLGGGE SSPEEPPSSI LATNKKIIIH HHVVIIVVWL LLQATTRTSG GGPTQIIGGI IPAPSLGGGE SSPEEPPSSI RRVVLLLALL ASSRRSFRGG VVAANNSSSL DDRRPWLQQQ AAPGGAAFNL LLQHRLLLLG IITTNNLRFF AVVNNDFIPP IILL--PPPI HHNNGSFRRR DDPDDVAVNL LLQYKLLLL- IITTNNLRFF AVVNNDFIPP IILL--PPPI HHNNGSFRRR DDPDDVAVNL LLQYKLLLL- DAAAAPP--- IDVVDDDRRT VQEEAHNTGG FFDRRFFTTT AAAYAAFFFD DLLDAAAAAA -----PPTTI VDAARRRNNT LYEEVLNVGG FFNRRVVKKK AAAYAAFFFE ELLEPNNNGG -----PPTTI VDAARRRNNT LYEEVLNVGG FFNRRVVKKK AAAYAAFFFE ELLEPNNNGG SSGGGNAAEE -AYYLQREIC CDDIVCCGGA A-RRRRREPR DRLLLLLRAG GLLSSAVVSN RRDSSRVVEE RELLFGRRIS SGGLIGGPKT GIHHRRREEQ VLMMMMMNAG GFFEESVVNY RRDSSRVVEE RELLFGRRIS SGGLIGGPKT GIHHRRREEQ VLMMMMMNAG GFFEESVVNY NARRMVGLFS GEG-HHSVVV VEEEACLLLT GGRRPPSASS AWWAAAAAGG GGGGNNNSSS YASKILWNYN YSNLSSIVVV VEESKFIIIS DDLLPPTLSS SWW------- ---------- YASKILWNYN YSNLSSIVVV VEESKFIIIS DDLLPPTLSS SWW------- ---------- NSSSSNNNVV SSSSDNNNNS SSSSSKSSSS GSSSVCCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -----DFFFF P-----WMDD AAADAGGGFF t5g66770.1 AYYYCCCTSM AAAAAIAAQQ VKQQEEQQQQ QHHQQQFFFF GPSLNNW-NN SLLSGSSSAA t5g66770.2 AYYYCCCTSM AAAAAIAAQQ VKQQEEQQQQ QHHQQQFFFF GPSLNNW-NN SLLSGSSSAA LLPPPPAAA- ---------- ---------A PDGGY----- ---------- ---------- FFPPDPFFQG GGGGDDDSND DPPFFFPLHA TTGGFRLFFT GGGGGGGGEE ESDWMMLISG FFPPPPFFQG GGGGDDDSND DPPFFFPLHA TTGGFRLFFT GGGGGGGGEE ESDWMMLISG ---------- ----YAAA-- ----DDD--- ---------- ---------- -VDAAAPEFF VVVAADDGGD CCTWHDDDYY VVVIDDDFFT TYSRRRVQQP DDLNRRRRRV VIDSSSLPPP VVVAADDGGD CCTWHDDDYY VVVIDDDFFT TYSRRRVQQP DDLNRRRRRV VIDSSSLPPP FFFAAFFPAA PAAAVV-AAR RRRREEEVGI I--------- LHLLLMCAES AAAAALLLLA PPPLLWWPSS PIPPLLESST KKKKEDETDS SDDFDLLEPP LKAAAYCASS KKKKKLLLLL PPPLLWWPSS PIPPLLESST KKKKEDETDS SDDFDLLEPP LKAAAYCASS KKKKKLLLLL ASAASAAGGG GRVVVVVHTL LSRRRRRRLL FPSVAAPPPT TDAEHAFFF- -YHFFEACPP LIESGDP--E ERVVFFFYTL LSNNRRRRLL SPSATTSSSS SSSTEDLLLS SYKLLDACPP LIESGDP--E ERVVFFFYTL LSNNRRRRLL SPSATTSSSS SSSTEDLLLS SYKLLDACPP PYYLKFFTAQ AILLAAFGDD VVIIFLLMMG WPPALLLIIQ AAAAPGGPPP F-IIIITIGG PYYSKFLTAQ AILLAATKNN IIVVFIIVVG WPPALLLLLQ AAAATSGPPP TQVVVVSIPP PYYSKFLTAQ AILLAATKNN IIVVFIIVVG WPPALLLLLQ AAAATSGPPP TQVVVVSIPP PGGGRDE-DG GLLAAADLAR RVSFRGGVAA ANNSLVVVRR RPPWMMLQII PGGGEAAFSV AEEESPEPAG GNLRRRDFAD DLDFIPPILL T--PILLLNN NGGSSSFRVV PDDDEVVVFM AEEESPEPAG GNLRRRDFAD DLDFIPPILL T--PILLLNN NGGSSSFRVV PDDDEVVVFM LQLLHRLGDP PDQQQQQPP- AVVLCCVVAS VKIFTTVVII QEDDDHHNNN KKTTGGFFFL LQLLYKL--- -DEEEEEPPT TAALLLAAKS LRVVTTLLGG YESSSLLNNN RRVVGGFFFA LQLLYKL--- -DEEEEEPPT TAALLLAAKS LRVVTTLLGG YESSSLLNNN RRVVGGFFFA DDDRFFTLYA AAVFSLLASS SSGGAGAMAE E--ALLLQRE CIIVVVGGEG ARRRHEPPPP NNNRVVKLYA AAVFSLLPLL LRDDEEVRVE ERREFFFGRR SLLIIIPPEK THHRMEEEEE NNNRVVKLYA AAVFSLLPLL LRDDEEVRVE ERREFFFGRR SLLIIIPPEK THHRMEEEEE SWDRLTAALS SAAGSSNRQQ QRLVVFSEEG G-SSVEEEEA DDTTTLGGGS SAAAAAWWAA EWVLMEAAFE ESSSNNYSQQ QKLLLYNSSN NLIIVEESSK PPSSSLAADT TLLLLSWW-- EWVLMEAAFE ESSSNNYSQQ QKLLLYNSSN NLIIVEESSK PPSSSLAADT TLLLLSWW-- GDGGGNNNNS SNSSGDDDSN SSSSSSSSGA ARDGSSSVV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---MMMDDTF PQ--WDDPAS SSSSGLLDDD t5g66770.1 AAYYMMTNNL LIIAAQQQQQ QQQQQQQQQH QQDHHHQQIF GNNPWNNTSG GGFFGLLSSS t5g66770.2 AAYYMMTNNL LIIAAQQQQQ QQQQQQQQQH QQDHHHQQIF GNNPWNNTSG GGFFGLLSSS DAAGGFPPPP AAAVV----- ---------- ------AADD DGVVGGY--- ---------- SGGSSAPPDP FQQVVGDDDS SDPPGGFPPL DDHHHHAATT TGGGGGFRRR RRLLSFGGGE SGGSSAPPPP FQQVVGDDDS SDPPGGFPPL DDHHHHAATT TGGGGGFRRR RRLLSFGGGE ---------- ---------- ---DDDA--- -DDD------ ---------- ---VDAAAAA ESDWWWMMME TLLIISSGDD AGWDDDDVVV YDDDPPFDYR LLSVQQPPSL LLVIDSSSSS ESDWWWMMME TLLIISSGDD AGWDDDDVVV YDDDPPFDYR LLSVQQPPSL LLVIDSSSSS LLPPAAAFPA AAPPPDAAAV VL---MRREE EEEEVI---- ------LVLM MGEADDHHAL PPLLLLLWPS SSPPPLSIPL LTHEEPTKEE DDPETSDDDD DFDLEPLLIY Y-SDDDPPAL PPLLLLLWPS SSPPPLSIPL LTHEEPTKEE DDPETSDDDD DFDLEPLLIY Y-SDDDPPAL SHLLAAVAAA AASSGGGGVV VAAVVHTTTL SRRRRLLFSV AAPPTDDDDD AAALL--HHE IRVVSSLDDD PPTT--EEVV VAAFFYTEEL SNNRRLLSSA TTSSSSSSSS DDDIISSTTD IRVVSSLDDD PPTT--EEVV VAAFFYTEEL SNNRRLLSSA TTSSSSSSSS DDDIISSTTD EACPPKFHHH TTAQQQAILL AAFFFFFGDD DHHVVISSLM MMGLWWPPPI IQLLLARPGP DACPPKFHHH TTAQQQAILL AATTTTTKNN NKHIIVGGIV VVGIWWPPPL LQLLLARTGP DACPPKFHHH TTAQQQAILL AATTTTTKNN NKHIIVGGIV VVGIWWPPPL LQLLLARTGP FF-LTIGSTG GGRELRRRVG LLRAADLLLA RRRVRFSGVV VAAANNNSLD ERPWMQQQQI TTQISIPSGE EESELIIITG NNRRRDFFFA KDDLNFDPII ILLT---PIH LNGSSRRRRV TTQISIPSGE EESELIIITG NNRRRDFFFA KDDLNFDPII ILLT---PIH LNGSSRRRRV AAPPAAFSVL QLLRLGGGDP QA-IAAVLDA ASRFFTTVIE AHKTGGLLLF TTFFFYYYYY DDPPVAVFML QLLKL----- ETTVTTALRK KSNVVTTLGE VLRVGGAAAV KKQQQFFFYY DDPPVAVFML QLLKL----- ETTVTTALRK KSNVVTTLGE VLRVGGAAAV KKQQQFFFYY SSSAVFFDSS LLLDAAAGGG NAAAAA---A YLLLQQEECD GGGRRREHPP SSRDDDDRLT SSSAVFFESS LLLEPPGDSS RVVVVVRRRE LFFFGGRRSG PKKHRREMEE EEQVVVVLME SSSAVFFESS LLLEPPGDSS RVVVVVRRRE LFFFGGRRSG PKKHRREMEE EEQVVVVLME AGGSVPPPLL LSSSNNNALL RQAARRMLVG LLGGG---HV EAADGCCLLL WGPFFSAWWE AGGEVKKKLL LNNNYYYAVV SQAAKKILLW NNYNNLYYSV EKKPGFFILL WDPLLSSWWR AGGEVKKKLL LNNNYYYAVV SQAAKKILLW NNYNNLYYSV EKKPGFFILL WDPLLSSWWR GDGDNNNSSN NNNVSSSDDD SSSNNNSGSS NKSRDSVCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --FFP----- -WWPMDPAAA ASSLDDAAFL t5g66770.1 MAYTGNLMMM AIIAQVIQQQ KEQQQQHHHQ DDFFGPLSSL NWWP-NTSSS LGFLSSGGAF t5g66770.2 MAYTGNLMMM AIIAQVIKKK KEQQQQHHHQ DDFFGPLSSL NWWP-NTSSS LGFLSSGGAF LPAV------ ---------- -------AGG GVGG------ ---------- ---------- FPFVTGGGGD SDPGFFFFFP PPDDDHHAGG GGGGRSDDFG GGGGGSEMME ELLSGGGGDV FPFVTGGGGD SDPGFFFFFP PPDDDHHAGG GGGGRSDDFG GGGGGSEMME ELLSGGGGDV ---DP----- GAAD------ ---------- ---VDDAAAP EEAAAAFPPC CAPPAAAAAV DDTDPYYVIY GPPDPPFFDD TYYRVQPLLN RRVIDDSSSL PPPPTLWPPS SSPPSSIIPL DDTDPYYVIY GPPDPPFFDD TYYRVQPLLN RRVIDDSSSL PPPPTLWPPS SSPPSSIIPL VV--AMMMRR RRRRVVVG-- -----LLVLS CCGEEGHLSS SSQQLADDSS HHHLASGIGG LLEESPPPTT KKKKTTTDDD DEPPPLLLID CC-SSSPESS SSTTLLQQII RRRVET--EE LLEESPPPTT KKKKTTTDDD DEPPPLLLID CC-SSSPESS SSTTLLQQII RRRVET--EE GRVVVATTTT TTAALRLFFP PP----PAAP PPDAAEA--Y YYEECYLHFF FTAANNQAII ERVVVATTTE EEAALRLSSP PPNNNNPTTS SSSSSTDLLN NNDDCYSHLL LTAANNQAII ERVVVATTTE EEAALRLSSP PPNNNNPTTS SSSSSTDLLN NNDDCYSHLL LTAANNQAII LLLEAAFFHH GGVHVDDQGG WAIIIQLLLL APGGGGPPPL RRRITTTGPT TGRRE-RRRV LLLEAATTEE KKIHIDDQGG WALLLQLLLL ATSSSSKKPI RRRVSSSGLG GESSEPIIIT LLLEAATTEE KKIHIDDQGG WALLLQLLLL ATSSSSKKPI RRRVSSSGLG GESSEPIIIT GADLAARRSV RRVFFFFRRV VASLEVVRWM LLLQIPPPPE AVAANNSVLQ RLLLLPPQAA GRDFAAKKVL DDLFFFFIII ITPILLLNSS FFFRVPPPPE VLAANNFMLQ KLLLL--ETT GRDFAAKKVL DDLFFFFIII ITPILLLNSS FFFRVPPPPE VLAANNFMLQ KLLLL--ETT PIDDAVDCVV SSVRPPIIFV VIIIEDHHNK KTTFFRRTEA LYYSSSASGG GAAAAAMMMA PVDDTARLAA SSLNPPVVVL LGGGESLLNR RVVFFRRKNA LFYSLLGRDD SEEEVVRRRV PVDDTARLAA SSLNPPVVVL LGGGESLLNR RVVFFRRKNA LFYSLLGRDD SEEEVVRRRV AEE-AAAQQI CVVCGGGGEA ARRRRHEEPL LSRRWWWRRR DDRRRRRRRA GGGLLAPLGG VEEREEEGGI SIIGPPPPET GHHRRMEEEK KEQQWWWRRR VVLLLNNNNA GGGFFSKLSS VEEREEEGGI SIIGPPPPET GHHRRMEEEK KEQQWWWRRR VVLLLNNNNA GGGFFSKLSS SNAALRRARL VVVLFFSSE- HSEAADDDCC TTGGWGRRPP PPLLSAASSS EAAAGGGDNN NYAAVSSAKL LLLNYYNNSL SISKKPPPFF SSAAWDLLPP PPLLTLLSSS R--------- NYAAVSSAKL LLLNYYNNSL SISKKPPPFF SSAAWDLLPP PPLLTLLSSS R--------- NNNNNSSSGS SSSSGGDSSN NNSSSKSSGG GARRDSSCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -------MTT TPFQQ----P PPDDDPSGLA t5g66770.1 MMYMMMMCSS SMAAIAAQVI KKEQQQQQQH HHQQQDDHII IGINNPPLPP PPNNNTGGLG t5g66770.2 MMYMMMMCSS SMAAIAAQVI KKEQQQQQQH HHQQQDDHII IGINNPPLPP PPNNNTGGLG AGPAAAAA-- ---------- ---------- AADDV----- ---------- ---------- GSPFFQQQGG GSNNNNPGGG PFFFPPPHHH AATTGRRLLF GGGGGGEESW EEIISGDDSD GSPFFQQQGG GSNNNNPGGG PFFFPPPHHH AATTGRRLLF GGGGGGEESW EEIISGDDSD ---------- YDDPPP---- -GD------- ---------- -----DAAAL LPEEFFFAAA GPDDCDDDDD HDDNPPVIIY YGDPPTTYPP SSLSSQPSSS DLLRVDSSSP PLPPPPPTTL GPDDCDDDDD HDDNPPVIIY YGDPPTTYPP SSLSSQPSSS DLLRVDSSSP PLPPPPPTTL FPPCAAAAAA VVL--AAAMM MMRREEEEAI R--------- -LLVHHLLLL MAAEAAGALA WPPSSSSIIP LLTEESSSPP PPKKEDPENS EDDDDDDDFD LLLLKKAAII YRRSDDSNEA WPPSSSSIIP LLTEESSSPP PPKKEDPENS EDDDDDDDFD LLLLKKAAII YRRSDDSNEA AAQLDDSSAA LLLLAAAAVV VSASGRVAVH LLLSRLFPPS SAAPPTTTDD AHHHHF-YHH AKTLQQIIES VVVVSSSELL LGDT-RVAFY LLLSNLSPPS STTSSSSSSS SEEEELLYKK AKTLQQIIES VVVVSSSELL LGDT-RVAFY LLLSNLSPPS STTSSSSSSS SEEEELLYKK YEEEAPLKFA FAANQEAAGG DHHHVVIIDF SLLMQLQLLL LLARRPGGGG F-LIIIPPSP NDDDAPSKFA LAANQEAAKK NKKHIIVVDF GIIVQIQLLL LLARRTSSGG TQIVIIAPSL NDDDAPSKFA LAANQEAAKK NKKHIIVVDF GIIVQIQLLL LLARRTSSGG TQIVIIAPSL TTGGRE-LRD DVVVGGGRLL ADDDAARRRR RSSSSRRVFF FRGGGVSSSL EVRPPPPWMQ GGEESEPLIA ATTTGGGRLL RDDDAAKKKK KVVVVDDLFF FIPPPIPPPI LLNGGGGSSR GGEESEPLIA ATTTGGGRLL RDDDAAKKKK KVVVVDDLFF FIPPPIPPPI LLNGGGGSSR QQQIAPAAVA VLLQLLHHRR LLGGGGDDDP PQAAP--DDA AAAVLLVASV PITIEDNKTG RRRVDPVVLA MLLQLLYYKK LL-------- -ETTPTTDDT TTTALLAKSL PVTGESNRVG RRRVDPVVLA MLLQLLYYKK LL-------- -ETTPTTDDT TTTALLAKSL PVTGESNRVG FFDDFTEEAA YYSSAVDSLL DDASSAGGGA GNNAMA--AY REIICDDDII VCCGGAAA-E FFNNVKNNAA FFSSAVESLL EEPLLGDSSE ERRVRVRREL RRIISGGGLL IGGPPGGGIE FFNNVKNNAA FFSSAVESLL EEPLLGDSSE ERRVRVRREL RRIISGGGLL IGGPPGGGIE RHEEEPPLLL SSRRWGLSAV GLQAMVLSSG EG-HHSSVVV EEEAAADGGT TLAEAAAGGG RMEEEEEKKK EEQQWGFESV SVQAILNNNY SNYSSIIVVV ESSKKKPGGS SLLR------ RMEEEEEKKK EEQQWGFESV SVQAILNNNY SNYSSIIVVV ESSKKKPGGS SLLR------ GGDDDDDNNN NNSNSNSSSS SGGGDNNSSN NKSDDDGGS ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -MTPFQQ--- --PDDAAAAS t5g66770.1 MMAYYMMMCC TSSGNNNLLM MIAQVKQQQE QQQQQQQHHQ DHIGINNPPL PPPNNSLLLG t5g66770.2 MMAYYMMMCC TSSGNNNLLM MIAQVKQQQE QQQQQQQHHQ DHIGINNPPL PPPNNSLLLG SSGLLDGFLP AAV------- --------AP PDDGGG---- ---------- ---------- FFGLLSSAFP FQVTGDDSDG FPPPLHHHAT TTTGGGRRLL LLSSDDGTGE EFFFFSELLI FFGLLSSAFP FQVTGDDSDG FPPPLHHHAT TTTGGGRRLL LLSSDDGTGE EFFFFSELLI ---------- -----YDDP- ------D--- ---------- ---------D AAAAAAPEFA ISGDVDDDDG GGTTWHDDPY VVIIIYDPPT TYPSRRLLSV VPSLLLNRVD TTSSSSLPPT ISGDVDDDDG GGTTWHDDPY VVIIIYDPPT TYPSRRLLSV VPSLLLNRVD TTSSSSLPPT AAFPPPPDAA AVVLL-RRRR EEEEEEVVAG GIR------- --LLLHLSCA AAGAIIIIIE LLWPPPPLSP PLLTTETTKK EDPEEETTND DSEDDDDDDE PPLLLKADCA AA-RIIIIIS LLWPPPPLSP PLLTTETTKK EDPEEETTND DSEDDDDDDE PPLLLKADCA AA-RIIIIIS EADDDHHALL LSAAAQQQLL LDSSHAALAA AVVSAIGGRR VVHFTTARLL FSSVVAAPDA SDDDDPPNEE ESKKKTTTLL LQIIREEVSS ELLGP-EERR VFYFEEARLL SSSAATTSSS SDDDDPPNEE ESKKKTTTLL LQIIREEVSS ELLGP-EERR VFYFEEARLL SSSAATTSSS EEEHAAL-YY YYHHYYCPLF AAAHFFQIII AHGGCHVVID SMQWWPLIQQ QQAALLAAGG TTTEDDILYY YYTTNNCPSF AAAHLLQIII AEKKSKIIVD GVQWWPLLQQ QQAALLAASS TTTEDDILYY YYTTNNCPSF AAAHLLQIII AEKKSKIIVD GVQWWPLLQQ QQAALLAASS GGFFLLRIGG GGPSTTGDDE -LRRRRDDGR RDLRRSSRVF SSFFANNEEV RRWWWMMMQI SSTTIIRVGG PPPSGGEPPE PLIIIIAAGR RDFKKVVDLF DDFFT--LLL NNSSSSSSRV SSTTIIRVGG PPPSGGEPPE PLIIIIAAGR RDFKKVVDLF DDFFT--LLL NNSSSSSSRV IPGEAASLLG DDAAAQPP-D AVLLLCCVAS VRPPIVEQAA ADHNTTTFRR RRRFTTEYYY VPDEVAFLL- -----EPPID TALLLLLAKS LNPPVLEYVV VSLNVVVFRR RRRVKKNFYY VPDEVAFLL- -----EPPID TALLLLLAKS LNPPVLEYVV VSLNVVVFRR RRRVKKNFYY SSVFFSSLLD AAAGGGNAAA AE----AAYQ REEIICIIIV EGAAAAA--R EREPSRDDRL SSVFFSSLLE PNNSSERVVV VERRRREELG RRRIISLLLI EKTTGGGIIH EREEEQVVLM SSVFFSSLLE PNNSSERVVV VERRRREELG RRRIISLLLI EKTTGGGIIH EREEEQVVLM LLLTTTAAGS LGALLRRAAA RMVGLSGGG- HHSVGGLTTL HRRFFFAWWW WAAAGDDGGN MMMEEEAAGE LSAVVSSAAA KILWNNYNNY SSIVGGISSL NLLLLLLWWW W--------- MMMEEEAAGE LSAVVSSAAA KILWNNYNNY SSIVGGISSL NLLLLLLWWW W--------- NNNNSNGSGG SSDNNNSSSS NNKKKSSSGA RRDDSSSLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ------MMMM TTTFPPPFQ- ---------- PMMDAAAASS t5g66770.1 MCTTGGLLAI IIQQQQIQQE QQQQHDHHHH IIIFGGGINP LSLLLNNNNP P--NSSLLGG t5g66770.2 MCTTGGLLAI IIQQQQIQQE QQQQHDHHHH IIIFGGGINP LSLLLNNNNP P--NSSLLGG SGLDAFPPPP ---------- ------PDG- ---------- ---------- ---------- GGLSGAPDDP GGGSNNPPGF FFPDHHTTGL SDDDFGGTGG EEEDMLLISG GGVVVAGPCC GGLSGAPPPP GGGSNNPPGF FFPDHHTTGL SDDDFGGTGG EEEDMLLISG GGVVVAGPCC ----DDPA-- -----GD--- ---------- ------DDAA AAAAPAAAPC CDAAAAVL-- CDTWDDNDYY IYYYYGDPFF FTRRLLSQSS DDNNRVDDTT SSSSLTLLPS SLSSIPLTEE CDTWDDNDYY IYYYYGDPFF FTRRLLSQSS DDNNRVDDTT SSSSLTLLPS SLSSIPLTEE MRRREEEVI- ---------- VVHLLMMCAG GAAAAEAAGG GDALASSQHA AAAAAAAASS PTKKEPETSD LEEPPPPPPP LLKAAYYCA- -RRRRSDDSS SDNEASSTRS SSSEEDPPTT PTKKEPETSD LEEPPPPPPP LLKAAYYCA- -RRRRSDDSS SDNEASSTRS SSSEEDPPTT SGGIIGRATA RRLLFPPSPP VAPTTTTAAF L-YHHHFYYE EEACPPYFFF FAHTTAANNQ T----ERATA NRLLSPPSPP ATSSSSSSDL ILYKKTLNND DDACPPYFFF FAHTTAANNQ T----ERATA NRLLSPPSPP ATSSSSSSDL ILYKKTLNND DDACPPYFFF FAHTTAANNQ QAIFFFHHVH HHHVIDFFFL MQGLQPPAAA LLLRRRGGGP PPFF-LRTTG IGPSSSSDEE QAITTTEKIH HHHIVDFFFI VQGIQPPAAA LLTRRRSSGK KPTTQIRSSG IPASSSSPEE QAITTTEKIH HHHIVDFFFI VQGIQPPAAA LLTRRRSSGK KPTTQIRSSG IPASSSSPEE EEE--GRRAA ADSRRRRRSF RGAANSDEEE VPMMMLLIIA AGAAAAAAAF FNVVVQLHRL EEEPSGRRRR RDVDDDNNDF IPLT-PHLLL LGSSSFFVVD DDVVVVVAAV VNMMMQLYKL EEEPSGRRRR RDVDDDNNDF IPLT-PHLLL LGSSSFFVVD DDVVVVVAAV VNMMMQLYKL LGPAADPPIA AVLCSRRPPI ITTVEEEQQA DHHTTGLDRR RRFFTEAAFY YYSAAAFDSL L----DPPVT TALLSNNPPV VTTLEEEYYV SLLVVGANRR RRVVKNAAQF FFSAAAFESL L----DPPVT TALLSNNPPV VTTLEEEYYV SLLVVGANRR RRVVKNAAQF FFSAAAFESL DASSASSGGG AGGNNNAAME LQQRREEICC DDIVCGGGEE GAAAAARRER REPSRRWRLT EPLLGRRDDS EEERRRVVRE FGGRRRRISS GGLIGPPPEE KTTTGGHRER REEEQQWRME EPLLGRRDDS EEERRRVVRE FGGRRRRISS GGLIGPPPEE KTTTGGHRER REEEQQWRME RRAAAGGAVP GGGSNARQAR MVVLLFFEG- -SSEEDDTTT LLGWHGRRPP PLLFSAAWEE NNAAAGGSVK SSSNYASQAK ILLNNYYSNL YIIEEPPSSS LLAWNDLLPP PLLLTLSWRR NNAAAGGSVK SSSNYASQAK ILLNNYYSNL YIIEEPPSSS LLAWNDLLPP PLLLTLSWRR GGDGGDDDNN VVSSSGSDSS SGSSSKSSSS GGARRRSCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -----DFQ-- --DSSLDGGG t5g66770.1 MYYYMMCCCD DDDSGNLLMM AQQQQQVIIK KQKKQQQQQQ QHHQDQINLS SNNGFLSSSS t5g66770.2 MYYYMMCCCD DDDSGNLLMM AQQQQQVIIK KKKKQQQQQQ QHHQDQINLS SNNGFLSSSS FFLLA----- ---------- -----AGVVY Y--------- ---------- ---------- AAFFQTGGGD SSSNNNPPPF FPHHHAGGGF FLSSFGGGTT GGGFFEESDE EWEEEETIIS AAFFQTGGGD SSSNNNPPPF FPHHHAGGGF FLSSFGGGTT GGGFFEESDE EWEEEETIIS ---------- -------YP- ---GADDDDD ---------- ---------- ---VDDDAAA SGGSVDGGPP PDDCCDTHNY YYYGPDDDDD FDDDYPPSSS VVQQSDDNNR RVVIDDDTTS SGGSVDGGPP PDDCCDTHNY YYYGPDDDDD FDDDYPPSSS VVQQSDDNNR RVVIDDDTTS APEFFAAAAF FFPAAPAAAV VVVV--MRRE EVGIR----- -----HHLSS AGAAEEEAAA SLPPPPPTLW WWPSSPSSIL LLLLHHPTKD ETDSEDDFFD LPPPPKKIDD A-RRSSSDDD SLPPPPPTLW WWPSSPSSIL LLLLHHPTKD ETDSEDDFFD LPPPPKKIDD A-RRSSSDDD DHALSSLAHH AALLLLAVSA IRHFFFFFTT ARFFFP-SAP PPTDAAHHHA FLLYYHFYYE DPNESSLLRR EEVVVVELGP -RYFFFFFTE ARSSSPNSTS SSSSSSEEED LIIYYKLNND DPNESSLLRR EEVVVVELGP -RYFFFFFTE ARSSSPNSTS SSSSSSEEED LIIYYKLNND ECPPPYYLLL FFAFFTNNNQ ILLEACCCDH HVVIIISSSS MQGLWWWAAA LAAALALLLR DCPPPYYSSS FFALLTNNNQ ILLEASSSNK HIIVVVGGGG VQGIWWWAAA LAAALATTTR DCPPPYYSSS FFALLTNNNQ ILLEASSSNK HIIVVVGGGG VQGIWWWAAA LAAALATTTR RPPGGGGGP- --LRRIIGGG GGPPPGGRLD DVGLRLLADD DDARRRRRVR RFFFRRGGGN RTTSSSSGKQ QQIRRVVGPP PPALLEESLA ATGNRLLRDD DDAKKKKKLN NFFFIIPPP- RTTSSSSGKQ QQIRRVVGPP PPALLEESLA ATGNRLLRDD DDAKKKKKLN NFFFIIPPP- SDVVWMMQIA AAGGEAVAAF LLHRLGPPAA AP----VLCV AKIFTTVQQQ EADDDDHNNK PHLLSSSRVD DDDDEVLAAV LLYKL----T TPTTTIALLA KRVVTTLYYY EVSSSSLNNR PHLLSSSRVD DDDDEVLAAV LLYKL----T TPTTTIALLA KRVVTTLYYY EVSSSSLNNR TGFFFLDDRF FTEEAFFYYY SVFDDDSLDA SAGGAGNMAA ---AYYLIIV VVCGEEEEER VGFFFANNRV VKNNAQQFFF SVFEEESLEN LGDDEERRVV RRRELLFLLI IIGPEEEEER VGFFFANNRV VKNNAQQFFF SVFEEESLEN LGDDEERRVV RRRELLFLLI IIGPEEEEER RERHEEEPPS SSWRDRLRGL SSVSNNALRA ARVFSGG-SS TLGWHHPAAA AAASSSSAAA RERMEEEEEE EEWRVLMNGF EEVNYYAVSA AKLYNYNLII SLAWNNPLLL LLLSSSSSS- RERMEEEEEE EEWRVLMNGF EEVNYYAVSA AKLYNYNLII SLAWNNPLLL LLLSSSSSS- GGDDGGGGNN NSSSSSDSNN NSNGGKGGRD GGSSSCCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --------MM D------WWM SSLLLLDDAG t5g66770.1 YCDDSGNLLI IQQVVVIIKK KQQQQQQEQQ QQQQQHHHHH QPLLLNPWW- GGLLLLSSGS t5g66770.2 YCDDSGNLLI IQQVVVIIKK KQQQQQQEQQ QQQQQHHHHH QPLLLNPWW- GGLLLLSSGS GGGFFLPAA- ---------- ---------- AADDVGYY-- ---------- ---------- SSSAAFDFQG GGGGDNNNDG GFPNLLHHHH AATTGGFFGG GGGGGEFEED WMMTLLSGGD SSSAAFPFQG GGGGDNNNDG GFPNLLHHHH AATTGGFFGG GGGGGEFEED WMMTLLSGGD --------AA AA-----GGD D--------- ---------- --DAALLPEF FAAFPPCDAV DSSAPCDTDD DDVIIIIGGD DFFYYPPSRR RRLSVSDDDL LVDTSPPLPP PTLWPPSLIL DSSAPCDTDD DDVIIIIGGD DFFYYPPSRR RRLSVSDDDL LVDTSPPLPP PTLWPPSLIL L----MEEEE EEEEIR---- ---LLLLLLL MSCAAAGGAA IAHSAAAASH HHALLLAVVV THEEEPEDDD PPPESEDDDD LEPLLAAIII YDCAAA--RR IDPSKLLLIR RRSVVVSLLL THEEEPEDDD PPPESEDDDD LEPLLAAIII YDCAAA--RR IDPSKLLLIR RRSVVVSLLL GIGGAAVVVH TTTALSSRRL P-PAAAPTDA AFF-YHFEAC CCPYLLLKFF AAAFTNQQLE --EEAAFFFY EEEALSSNRL PNPTTTSSSS DLLSYKLDAC CCPYSSSKFF AAALTNQQLE --EEAAFFFY EEEALSSNRL PNPTTTSSSS DLLSYKLDAC CCPYSSSKFF AAALTNQQLE EAFFHCCCDV VVVDSSSLLL MMMQGQWWPP AALLQLLLLL LRPPPPGGPP PF-LLRRRGI EATTESSSNI IIIDGGGIII VVVQGQWWPP AALLQLLLTT TRTTTTSGKP PTQIIRRRGI EATTESSSNI IIIDGGGIII VVVQGQWWPP AALLQLLLTT TRTTTTSGKP PTQIIRRRGI GGPPPPPPSS SPPPTRDD-- RVVLRLLADD DLLSRRRVVV FSFGVNNSSL DDEVRRPWLL PPAAPPPPSS SLLLGSPPPS ITTNRLLRDD DFFVDDDLLL FDFPI--PPI HHLLNNGSFF PPAAPPPPSS SLLLGSPPPS ITTNRLLRDD DFFVDDDLLL FDFPI--PPI HHLLNNGSFF LQPGSLLLRR RGQQQPPP-- DDAAVLCVVA VVRPPIFFTV IIQDDTGGFL TELFFFYYAV FRPDFLLLKK K-EEEPPPII DDTTALLAAK LLNPPVVVTL GGYSSVGGFA KNLQQQFYAV FRPDFLLLKK K-EEEPPPII DDTTALLAAK LLNPPVVVTL GGYSSVGGFA KNLQQQFYAV VFFDDSSLLD AAAAAAAASG GNNNNMAAEE LQEECDCGGG AEERHHLSSS WRDRRRRTRA VFFEESSLLE PPNNNGGGRS ERRRRRVVEE FGRRSGGPPK GEERMMKEEE WRVLLLLENA VFFEESSLLE PPNNNGGGRS ERRRRRVVEE FGRRSGGPPK GEERMMKEEE WRVLLLLENA SAAVVGGALL AMLLLVVGFS SGGE--HSEE EGGCTLLGGW WHPPLFFFSA AWWEAAAAGD ESSVVSSAVV AILLLLLWYN NYYSYYSIEE EGGFSLLAAW WNPPLLLLTS SWWR------ ESSVVSSAVV AILLLLLWYN NYYSYYSIEE EGGFSLLAAW WNPPLLLLTS SWWR------ GGGGGGDNNN SSNSVVSSSS GGSSSNNSGG GKSRRSVVL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- -------MMD DFFPPFQ--- ----WWWWMA SSGDDPPAAV t5g66770.1 MAYMMMCSGL QQQVKQQQQQ QQQQHHDHHQ QFFGGINPPS LLNPWWWW-S GGGSSPDFQV t5g66770.2 MAYMMMCSGL QQQVKKQQQQ QQQQHHDHHQ QFFGGINPPS LLNPWWWW-S GGGSSPPFQV ---------- -------APP DDGGGYY--- ---------- ---------- ---------- TTDSSNNDDG PPPPLHHATT TTGGGFFDDG GGGGESSDEE EMTTLISGGV VDGGPPPDCC TTDSSNNDDG PPPPLHHATT TTGGGFFDDG GGGGESSDEE EMTTLISGGV VDGGPPPDCC --YYDPPAA- ------GGGA DDD------- ---------- --DAALLFAP PPPDDDAAAA CDHHDNPDDY YYIIYYGGGP DDDPPFDDYS RLSVQQSSSL LRDTSPPPTP PPPLLLIIPP CDHHDNPDDY YYIIYYGGGP DDDPPFDDYS RLSVQQSSSL LRDTSPPPTP PPPLLLIIPP AL-ARREEEV GGGGII---- ----VVVHHL LSCGGAIEAG GGGGHAAQLA AASHAAAVSS PTESTKDDPT DDDDSSDDDD DFEPLLLKKI IDC--RISDS SSSSPKKTLL LLIREEELGG PTESTKDDPT DDDDSSDDDD DFEPLLLKKI IDC--RISDS SSSSPKKTLL LLIREEELGG AAAAIIGGGG RVVHTAALLL LRLFP-SVVA PPDAAHHLL- ---HHYCPLK KFFFAAFFTA DPPP--EEEE RVFYEAALLL LRLSPNSAAT SSSSSEEIIL LSSKTNCPSK KFFFAALLTA DPPP--EEEE RVFYEAALLL LRLSPNSAAT SSSSSEEIIL LSSKTNCPSK KFFFAALLTA NNILEFHHGD VIDDFSQALQ AALAARRPGG GPPPF-LLRI GGIGPPPTGR RD---LRRGL NNILETEEKN IVDDFGQALQ AALAARRTGG GKPPTQIIRV GGIPAPLGES SPPPPLIIGN NNILETEEKN IVDDFGQALQ AALAARRTGG GKPPTQIIRV GGIPAPLGES SPPPPLIIGN LLDLAARSVR VFFFSFFGVN SSSSLVVRLI AAPAAVVAAF FFSVLLHLDP DDA--IDDDA LLDFAAKVLD LFFFDFFPI- PPPPILLNFV DDPVVLLAAV VVFMLLYL-- DDTTIVDDDT LLDFAAKVLD LFFFDFFPI- PPPPILLNFV DDPVVLLAAV VVFMLLYL-- DDTTIVDDDT AVVLLDDDCA VPKFTVQQAD DDKKKTGGFF FLLFFFTTTE ELLLFYYYYY SSAVVVFDDS TAALLRRRLK LPRVTLYYVS SSRRRVGGFF FAAVVVKKKN NLLLQFFFYY SSAVVVFEES TAALLRRRLK LPRVTLYYVS SSRRRVGGFF FAAVVVKKKN NLLLQFFFYY SSAVVVFEES SDSSGGAANN NAAEEAYLEI CICGEEGGGA AAAA--REER HHEEPSRRRR WWWDRRRRRT SELLSSEERR RVVEEELFRI SLGPEEKKKT TGGGIIHEER MMEEEEQQQQ WWWVLLLLLE SELLSSEERR RVVEEELFRI SLGPEEKKKT TGGGIIHEER MMEEEEQQQQ WWWVLLLLLE TRALLLLLLS VPLNLQARMM LVVVGGLLFS GEG--SSVVV VEAGGTLLLL GGGRPLLASA ENAFFFFFFE VKLYVQAKII LLLLWWNNYN YSNLYIIVVV VEKGGSLLLL AAALPLLLSS ENAFFFFFFE VKLYVQAKII LLLLWWNNYN YSNLYIIVVV VEKGGSLLLL AAALPLLLSS EGGGGDGDNN NNNNSVVVVS SSGSNGSKKS SGGARRRSC R--------- ---------- ---------- --------- R--------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -----MDFFF ---WWPPMDD t5g66770.1 MMMAAYYYYM MTDGNLAIIA QQQKQQQKQQ QQQQEQQQQQ QHHHHHQFII LLPWWPP-NN t5g66770.2 MMMAAYYYYM MTDGNLAIIA QQQKKKKKQQ QQQQEQQQQQ QHHHHHQFII LLPWWPP-NN DAAASSGLLD AFLPPAAAV- ---------- ------PPDD G--------- ---------- NSSSGGGLLS GAFDPFQQVG GGDSPFPNNH HHHHHHTTTT GRRDTGGEFE DEEWMETGGD NSSSGGGLLS GAFPPFQQVG GGDSPFPNNH HHHHHHTTTT GRRDTGGEFE DEEWMETGGD ---------- ---YYDPPAA ----GADD-- -------DAL LLEFFFFPPP CAAPAA---- VADDGGGGCC CDTHHDPPDD YIIIGPDDPP TSSVQDLDSP PPPPPPWPPP SSSPIPEEEE VADDGGGGCC CDTHHDPPDD YIIIGPDDPP TSSVQDLDSP PPPPPPWPPP SSSPIPEEEE AAARREEEEE VVAGIRR--- ----LVHLLM MSCCAGGGIG DLASSQQLAA ADAAASSSAA SSSTTEEEDE TTNDSEEDDF LEEPLLKIIY YDCCA---IS DEASSTTLLL LQEEEGGGDP SSSTTEEEDE TTNDSEEDDF LEEPLLKIIY YDCCA---IS DEASSTTLLL LQEEEGGGDP ASGGRRRVVF FTTASSRLLL F-SSPPPADD AAEEEHAFL- -YHHFYYACP PPLLLHHTAE PT--RRRVFF FTEASSNLLL SNSSPPPTSS SSTTTEDLIL LYKTLNNACP PPSSSHHTAE PT--RRRVFF FTEASSNLLL SNSSPPPTSS SSTTTEDLIL LYKTLNNACP PPSSSHHTAE AGGDDHHHVV FSSGQQQQWW WPPALQLAAL RPPP-RIIII PPSPPPTRDD DDEEE-RDDV AKKNNKKKII FGGGQQQQWW WPPALQLAAT RKPPQRVVII AASLLLGSPP PPEEESIAAT AKKNNKKKII FGGGQQQQWW WPPALQLAAT RKPPQRVVII AASLLLGSPP PPEEESIAAT GRLLLALLLL ASSRFSSVVA AANSSDVVVR RPWWMLQQAP AVAFFNNSVL RLLLGGDDPA GRLLLRFFFF AVVDFDDIIL LT-PPHLLLN NGSSSFRRDP VLAVVNNFML KLLL------ GRLLLRFFFF AVVDFDDIIL LT-PPHLLLN NGSSSFRRDP VLAVVNNFML KLLL------ ADQQAPP--- DDAAAVCVAA SSSVVVRRFF FTTIEEDDDD DNNKTTGDDR RRFTEEAAFF -DEETPPTTI DDTTTALAKK SSSLLLNNVV VTTGEESSSS SNNRVVGNNR RRVKNNAAQQ -DEETPPTTI DDTTTALAKK SSSLLLNNVV VTTGEESSSS SNNRVVGNNR RRVKNNAAQQ AAVVFLDDDA AAASAAGGAA A-AAYLLQRE IIIIVGAAA- PPLLSRWWRD DDRGLLAAVP AAVVFLEEEP PPNLGGEEVV VREELFFGRR ILLLIKTGGI EEKKEQWWRV VVLGFFSSVK AAVVFLEEEP PPNLGGEEVV VREELFFGRR ILLLIKTGGI EEKKEQWWRV VVLGFFSSVK SNNNARRQLL LFFSGGG--H HSVVEAADGG CLLGGWHHGG RLLAASWEEA AGDDGGNNNN NYYYASSQLL NYYNYYNLYS SIVVEKKPGG FILAAWNNDD LLLLLSWRR- ---------- NYYYASSQLL NYYNYYNLYS SIVVEKKPGG FILAAWNNDD LLLLLSWRR- ---------- NNSSNNSNVV SGSSGSDSNN NSNGGKSGAA ARSSSVVCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- -DFFFPP--- ---------P PAAAAASSSG t5g66770.1 AAYDNLLLAA IAQQQQQVVV IQQQQQQHHQ DQFFFGGPLS LLLLNNNNPP TSSSLLGFFG t5g66770.2 AAYDNLLLAA IAQQQQQVVV IKKQQQQHHQ DQFFFGGPLS LLLLNNNNPP TSSSLLGFFG LAGGPPPAV- ---------- ---------- ---APDDVYY ---------- ---------- LGSSPPPFVG DDSSSNNNND DDPPPGGGGF PFHATTTGFF RRLFGGGGGG GEEFESWWME LGSSPPPFVG DDSSSNNNND DDPPPGGGGF PFHATTTGFF RRLFGGGGGG GEEFESWWME ---------- ----PPPAA- --GA------ ---------V VDDALEEEFA AAAAAFFPAA EETTDVAGPD CDDTNNPDDV YYGPPFSSLL SVPSSLLVVI IDDSPPPPPP PPTLLWWPSS EETTDVAGPD CDDTNNPDDV YYGPPFSSLL SVPSSLLVVI IDDSPPPPPP PPTLLWWPSS PDAAV--MMM RRREEEVVAA GR-------- -----LLMMS CCGGGAEEAG GGDHLLSAAL PLPPLHEPPP TTKEPPTTNN DEDDFLLLEE EPPPPLAYYD CC---RSSDS SSDPEESKKL PLPPLHEPPP TTKEPPTTNN DEDDFLLLEE EPPPPLAYYD CC---RSSDS SSDPEESKKL AADSSHAAAA AASSAIGGRR RRVVVHFTTT ASSRRRFFPS PVVVAPPTTT TDAAAAAL-Y LLQIIRESSS EEGGD-EERR RRVVVYFTTT ASSNRRSSPS PAAATSSSSS SSSSDDDILY LLQIIRESSS EEGGD-EERR RRVVVYFTTT ASSNRRSSPS PAAATSSSSS SSSSDDDILY HHHYACCPYF NNNNQQAFFH HGCCVVVVHH IIIIDDFFLL MQQGGLAIIQ AAALLRGPFL KTTNACCPYF NNNNQQATTE EKSSIIIIHH VVVVDDFFII VQQGGIALLQ AAATTRSKTI KTTNACCPYF NNNNQQATTE EKSSIIIIHH VVVVDDFFII VQQGGIALLQ AAATTRSKTI RIGGPPGGRR DDEE--LLRR DVLRRRLADL LLRRRVVRRR FFRGVVVANS SLDDEVPMLQ RVPPPLEESS PPEESSLLII ATNRRRLRDF FFKKKLLDNN FFIPIIIL-P PIHHLLGSFR RVPPPLEESS PPEESSLLII ATNRRRLRDF FFKKKLLDNN FFIPIIIL-P PIHHLLGSFR QIIIAPPPEE ANNSVQQLLH LLLGDP---D DAVVLLDDCV VAVVPKFFTT IIQQEAADHH RVVVDPPPEE VNNFMQQLLY LLL-DPIIID DTAALLRRLA AKLLPRVVTT GGYYEVVSLL RVVVDPPPEE VNNFMQQLLY LLL-DPIIID DTAALLRRLA AKLLPRVVTT GGYYEVVSLL HHKTFDREEE ALLFFYYYSS VAASAAANAA AAE-AAYYQR REIDVCGEGA A-ELLSSTAA LLRVFNRNNN ALLQQFYYSS VPPLEEERVV VVEREELLGR RRIGIGPEKT GIEKKEEEAA LLRVFNRNNN ALLQQFYYSS VPPLEEERVV VVEREELLGR RRIGIGPEKT GIEKKEEEAA AGGGLSSSVV PLLGGSSNAR QARMMFGEG- -HHHSSVVEE EDDCLLLLLG GWHGRRRPFA AGGGFEEEVV KLLSSNNYAS QAKIIYYSNL YSSSIIVVES SPPFILLLLA AWNDLLLPLL AGGGFEEEVV KLLSSNNYAS QAKIIYYSNL YSSSIIVVES SPPFILLLLA AWNDLLLPLL AAAAGGDDGG GGGDNNVGSG SSNGGSSGGA RRGSSVVVC LSS------- ---------- ---------- --------- LSS------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----MTFF-- --WWPDDDDP AASGGGDAAA t5g66770.1 MAYMMCTTTD DDSSGNLLMI IKKQQEQQQH HQQDHIIIPL NNWWPNNNNT SSGGGGSGGG t5g66770.2 MAYMMCTTTD DDSSGNLLMI IKKQQEQQQH HQQDHIIIPL NNWWPNNNNT SSGGGGSGGG GFPPAAV--- ---------- ---------- --AAPDDGVV GYYY------ ---------- SAPPQQVTGG GDSSNPPGFF PPPNNNLLHH HHAATTTGGG GFFFLDGGGG TGGGGGEEEF SAPPQQVTGG GDSSNPPGFF PPPNNNLLHH HHAATTTGGG GFFFLDGGGG TGGGGGEEEF ---------- ---------- --YYPPPPPA ----GA---- ---------- -----AAAAL SDEEETGGGG GDDSVVDDGP DDHHNPPPPD IIYYGPFDDY PPRRSSVVVQ DDLNVTSSSP SDEEETGGGG GDDSVVDDGP DDHHNPPPPD IIYYGPFDDY PPRRSSVVVQ DDLNVTSSSP PFAFPCCPPA AAAAAAAAAV LLL----ARE EEEAAGI--- ----VHLLLM SCAGIAAAAL LPPWPSSPPS IIIPPPPPPL TTTHEEESTP PEENNDSDDF LLPPLKAAAY DCA-IDNNNE LPPWPSSPPS IIIPPPPPPL TTTHEEESTP PEENNDSDDF LLPPLKAAAY DCA-IDNNNE ASSDHHAAAL AAAAVAAAAA SSGGGGGIIG VVVHHHTTTL SSSRFPPPPP PPPDDDDAEH ASSQRREEEV SEEELDDPPP TT-------E FFFYYYTTEL SSSRSPPPPS SSSSSSSSTE ASSQRREEEV SEEELDDPPP TT-------E FFFYYYTTEL SSSRSPPPPS SSSSSSSSTE F-----YHFY YAACCCCFAA AAANQQLGHH VVVHVVIFLQ QQQQPPAAAL LQQALALRGG LLLLLSYTLN NAACCCCFAA AAANQQLKKK IIIHIIVFIQ QQQQPPAAAL LQQALATRSS LLLLLSYTLN NAACCCCFAA AAANQQLKKK IIIHIIVFIQ QQQQPPAAAL LQQALATRSS PPFLRITGIG PPSSPPRREE E--LLDDLLR RADLLLAVVR RRRFSFFGVN NLLVVRPWMQ PPTIRVSGIP AASSLLSSEE EPSLLAANNR RRDFFFALLD DNNFDFFPI- -IILLNGSSR PPTIRVSGIP AASSLLSSEE EPSLLAANNR RRDFFFALLD DNNFDFFPI- -IILLNGSSR QIAPEEAAVV VANNNSSSVV LLQQLHRLLL LGDDDQ---- IDDAVLDDDD VAVVVVRKIF RVDPEEVVLL LANNNFFFMM LLQQLYKLLL L-DDDETTTI VDDTALRRRR AKLLLLNRVV RVDPEEVVLL LANNNFFFMM LLQQLYKLLL L-DDDETTTI VDDTALRRRR AKLLLLNRVV FTIEQQAANK FLLLDDRRFY YYYVVDSDAS GGGGAANAAL RRIIIDGEGG ARRRERPSRR VTGEYYVVNR FAAANNRRQF YYYVVESEPR DSSSEERVEF RRIIIGPEKK THRREREERR VTGEYYVVNR FAAANNRRQF YYYVVESEPR DSSSEERVEF RRIIIGPEKK THRREREERR RDDRLLLLTG GSAAAAVPLS ALRARRLLVL GGG----VEA DGLLLGFSSA AAEDDDDGGG RVVLMMMMEG GESSSSVKLN AVSAKKLLLN YNNLLLYVSK PGIILDLTTL LLR------- RVVLMMMMEG GESSSSVKLN AVSAKKLLLN YNNLLLYVSK PGIILDLTTL LLR------- GGGDNNNNSS GSSSGGDNSG SNGGKKSSSA RDGGSSSSL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------T FFQQQQ---W WMMMDSSSSS t5g66770.1 MAAYYMDDSN NLMIQQVIIK KQKKQQQQQQ QQQQQQHQDI FINNNNSNNW W---NGGGFF t5g66770.2 MAAYYMDDSN NLMIQQVIIK KKKKQQQQQQ QQQQQQHQDI FINNNNSNNW W---NGGGFF GGFLPPPAAA V--------- ---------- --PPDGVGGG ---------- ---------- GSAFPPDFFQ VTDDPPGFFP PNLLDHHHHH HHTTTGGGGG RSDFFGGGGT TGGEEFFFFF GSAFPPPFFQ VTDDPPGFFP PNLLDHHHHH HHTTTGGGGG RSDFFGGGGT TGGEEFFFFF ---------- ---------- ---YYYDPPA --GA------ ---------- AAAAAAAAAF EEEEMETTLI SGGGSSADPD DTWHHHDNPD YVGPPPFFFD YPSRRLSPSV SSSPTTTTTW EEEEMETTLI SGGGSSADPD DTWHHHDNPD YVGPPPFFFD YPSRRLSPSV SSSPTTTTTW FPPAAAAVL- --REEEEEEE AII------- LLVLLMSSGG GAIEAGHLAA AAQLDDSAAA WPPSIIPLTH EETEEEEDEE NSSDDDDDLL LLLAIYDD-- -RISDSPEAA KKTLQQISSS WPPSIIPLTH EETEEEEDEE NSSDDDDDLL LLLAIYDD-- -RISDSPEAA KKTLQQISSS LLLLGGIGGG VHHHTTTTTA LFFPSPPVAA EAFLYYYHHF EAAACYLLTA ANAAIIIHHG VVVV---EEE VYYYTTTEEA LSSPSPPATS TDLIYYYKKL DAAACYSSTA ANAAIIIEEK VVVV---EEE VYYYTTTEEA LSSPSPPATS TDLIYYYKKL DAAACYSSTA ANAAIIIEEK GCVHVIQQQL QWAAIIQQAL AARPGGTTGG GIGPPPPPPT GRD--LRDDV GLRADLARRS KSIHIVQQQI QWAALLQQAL AARTSSSSGG GIPPPLLLLG ESPPSLIAAT GNRRDFAKKV KSIHIVQQQI QWAALLQQAL AARTSSSSGG GIPPPLLLLG ESPPSLIAAT GNRRDFAKKV SSVRVSRAAN NLEVRPWWWM MLQQPGGEEA AVFSQQLHHR LGDPDDQAAP P---DVVLLV VVLDLDILT- -ILLNGSSSS SFRRPDDEEV VLVFQQLYYK L---DDETTP PTTTDAALLA VVLDLDILT- -ILLNGSSSS SFRRPDDEEV VLVFQQLYYK L---DDETTP PTTTDAALLA VAVRPKIIII FFIQQQEAAD DDNKGGFFFL LDDRFEAFYY YSAAVDSLAA SAGGGGGNNN AKLNPRVVVV VVGYYYEVVS SSNRGGFFFA ANNRVNAQYY YSAAVESLNN LGDSSSERRR AKLNPRVVVV VVGYYYEVVS SSNRGGFFFA ANNRVNAQYY YSAAVESLNN LGDSSSERRR AAAMMMAAAE EEAAAALRRE EEIIVVCCCG EGGRREHEEE EPSSRRRRDD LLTRAAGLSA VVVRRRVVVE EEEEEEFRRR RRILIIGGGP EKKRREMEEE EEEEQQQQVV MMENAAGFES VVVRRRVVVE EEEEEEFRRR RRILIIGGGP EKKRREMEEE EEEEQQQQVV MMENAAGFES VGGNNALQAA AARLGGLLSG EEE----HSE CLLTTLLGWW HRRRPPFSSA WEAAAGDDDN VSSYYAVQAA AAKLWWNNNY SSSLLLLSIS FIISSLLAWW NLLLPPLSSS WR-------- VSSYYAVQAA AAKLWWNNNY SSSLLLLSIS FIISSLLAWW NLLLPPLSSS WR-------- NNNNSSNVVS SGSSDSNGSS SNSARGGGGG SSVCCCCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --TFFFFQ-- --PPMDDASS GLLDAGFLPP t5g66770.1 MCDSSAIIIQ QIKKQQQQKQ QQQQQHHHHQ QDIFIIINLS SLPP-NNLFF GLLSGSAFDD t5g66770.2 MCDSSAIIIQ QIKKKKKKKQ QQQQQHHHHQ QDIFIIINLS SLPP-NNLFF GLLSGSAFPP PPAVV----- ---------- -----APDDG GG-------- ---------- ---------- PPQVVTTGGG GSNGFFPPPD HHHHHATTTG GGRRSFGGGG GGEESSDDDW MMETLIGDDD PPQVVTTGGG GSNGFFPPPD HHHHHATTTG GGRRSFGGGG GGEESSDDDW MMETLIGDDD ---------- -----YDPPP PAA--GAAD- ---------- ---------- VDAAAALPFA SSVVAADDDD DDTWWHDNNP PDDYYGPPDF DTTPSLLSSS VQQDDLNNRR IDTTSSPLPP SSVVAADDDD DDTWWHDNNP PDDYYGPPDF DTTPSLLSSS VQQDDLNNRR IDTTSSPLPP AFPCAAAAAA AAVLLAAAAA AREEVAIR-- -----HLAGG GAAIEEAAGD DDHAAAASSA TWPSSSSSSP PPLTTSSSSS STEETNSEDF DLEEPKAA-- -RRISSDDSD DDPNAAASSK TWPSSSSSSP PPLTTSSSSS STEETNSEDF DLEEPKAA-- -RRISSDDSD DDPNAAASSK AQDSHALLLA ASAAAAGIII GGRAVVTTTT TLLLRRLPSP AAPPPPTDAE EEEAAFLHHH KTQIREVVVS EGDPPP---- EERAFFTTTT ELLLNRLPSP TTSSSSSSST TTTDDLIKKT KTQIREVVVS EGDPPP---- EERAFFTTTT ELLLNRLPSP TTSSSSSSST TTTDDLIKKT HECYYKKFAH AQAILLLEAA FHGVVVVDMG GLQWPPALLI QAAALRGPPF -RRGPPPPGE TDCYYKKFAH AQAILLLEAA TEKIIIIDVG GIQWPPALLL QAAATRSKKT QRRPAAPPEE TDCYYKKFAH AQAILLLEAA TEKIIIIDVG GIQWPPALLL QAAATRSKKT QRRPAAPPEE LLDLLDDLLA RRRRRSSVVR VRRRFSRGVV SSEVVPWMLA PGQHHLLGDA DDDDP---DA LLANNDDFFA KKKKKVVLLD LNNNFDIPII PPLLLGSSFD PDQYYLL--- DDDDPTTTDT LLANNDDFFA KKKKKVVLLD LNNNFDIPII PPLLLGSSFD PDQYYLL--- DDDDPTTTDT VCCVAVRPPI IFFFFFTTIQ QAHHNNFFLD RTTEEELLFY YYYSAAFSSL DAAAAASGGA ALLAKLNPPV VVVVVVTTGY YVLLNNFFAN RKKNNNLLQF YYYSAAFSSL EPNNGGRDSE ALLAKLNPPV VVVVVVTTGY YVLLNNFFAN RKKNNNLLQF YYYSAAFSSL EPNNGGRDSE GGGGNNMMAE EYYYYYLLQQ RIIIVAAARR HHHESSRWWR RRRRTTRRGL SAAGSNNNAL EEEERRRRVE ELLLLLFFGG RIILITTGHR MMMEEEQWWR RRRLEENNGF ESSSNYYYAV EEEERRRRVE ELLLLLFFGG RIILITTGHR MMMEEEQWWR RRRLEENNGF ESSSNYYYAV QQRRRLVVGF FFG---EEEA AGGCCCTLGG HGPPPLLLLF SSAEAAAAGG GGGGDDNNNN QQKKKLLLWY YYNLLYEESK KGGFFFSLAA NDPPPLLLLL TSSR------ ---------- QQKKKLLLWY YYNLLYEESK KGGFFFSLAA NDPPPLLLLL TSSR------ ---------- SSNNNSSVSS GSSSSSDSNN NNSSSSGGKS SSADDSVCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -MMPPF---- ---WPPAAAS t5g66770.1 MYMCCCTSSS GLMMMIQQVV IKKQQQQQQQ QEEQQQQQHH QHHGGIPPLL LPPWTTLLLG t5g66770.2 MYMCCCTSSS GLMMMIQQVV IKKQQQQQQQ QEEQQQQQHH QHHGGIPPLL LPPWTTLLLG SSGLLLAGGF PPPPAAAAVV ---------- --APDGGGGY YYY------- ---------- FFGLLLGSSA DDPPFFFQVV TGGGDSSPPP DHATTGGGGF FFFRLLGTTT GGGESEEWLL FFGLLLGSSA PPPPFFFQVV TGGGDSSPPP DHATTGGGGF FFFRLLGTTT GGGESEEWLL ---------- ---------- YDDPP----G AD-------- -------VDA LLLPFAAAAF LISGGGDSSV VVDPDDCCDW HDDPPYIIYG PDPDTPPSSS QQSLLNRIDS PPPLPPTLLW LISGGGDSSV VVDPDDCCDW HDDPPYIIYG PDPDTPPSSS QQSLLNRIDS PPPLPPTLLW CAPDDDDAAA AVV-AAMRRE EEEVAGIIRR ---------L LVVVVSSCCC GIIEEAAGDD SSPLLLLSIP PLLESSPTTE EEETNDSSEE DDDFDLEPPL LLLLLDDCCC -IISSDDSDD SSPLLLLSIP PLLESSPTTE EEETNDSSEE DDDFDLEPPL LLLLLDDCCC -IISSDDSDD DDHHLAQQQQ LHHAAALLAA VVSAAARRRA VVVHTLLSLL -SSVVVPPTE AFL---YYHF DDPPEKTTTT LRRSSSVVSE LLGDDDRRRA FFFYELLSLL NSSAAASSST DLILSSYYKL DDPPEKTTTT LRRSSSVVSE LLGDDDRRRA FFFYELLSLL NSSAAASSST DLILSSYYKL YYYCCCLLKF HFQQAIEEAF FFHHCDDHHI DFFSQLQWWW WPALIQAARP GGPRRRRGIG NNNCCCSSKF HLQQAIEEAT TTEESNNKKV DFFGQIQWWW WPALLQAART SGPRRRRGIP NNNCCCSSKF HLQQAIEEAT TTEESNNKKV DFFGQIQWWW WPALLQAART SGPRRRRGIP PPSSPTE-LL RDDVLLLRRR LAAARSSSVV RFFFSFRGAA NNNNSSSLLL DDEVMMMIIA PPSSLGEPLL IAATNNNRRR LRRAKVVVLL NFFFDFIPLT ----PPPIII HHLLSSSVVD PPSSLGEPLL IAATNNNRRR LRRAKVVVLL NFFFDFIPLT ----PPPIII HHLLSSSVVD APPGGEEEEA VVVVVNNVVV LLLGPPPAAD AA--DDVVLD CCCSVPKKII IFTEEQDHKT DPPDDEEEEV LLLLLNNMMM LLL------D TTTIDDAALR LLLSLPRRVV VVTEEYSLRV DPPDDEEEEV LLLLLNNMMM LLL------D TTTIDDAALR LLLSLPRRVV VVTEEYSLRV DRFLLYAADS LLDSGGGAMA -QEEIIDDDV CCGEAAAA-- RRERHPPLSS WRRDRLTRAA NRVLLYAAES LLELDSEVRV RGRRIIGGGI GGPETTTGII HRERMEEKEE WRRVLMENAA NRVLLYAAES LLELDSEVRV RGRRIIGGGI GGPETTTGII HRERMEEKEE WRRVLMENAA GLLSPARAAR VGFSGG--HV VVEAAAGGGC LLLWWGGGSW EEAAAADDGG GDNNSSNNNN GFFEKASAAK LWYNYNLYSV VVEKKKGGGF IILWWDDDSW RR-------- ---------- GFFEKASAAK LWYNYNLYSV VVEKKKGGGF IILWWDDDSW RR-------- ---------- NVGSSDSNNS SGKSSSGGGA RRDDGGSSSS VCCCLLLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- ------MDTP Q----MMDPA t5g66770.1 AAACCTDSLL AAAAQQQQQV VVIIIKKQKK QQQQEQQQQQ QQQQQQHQIG NPLLN--NTS t5g66770.2 AAACCTDSLL AAAAQQQQQV VVIIIKKKKK QQQQEQQQQQ QQQQQQHQIG NPLLN--NTS ASSLLDAFPP PPVV------ -------AAA APDDGG---- ---------- ---------- LFFLLSGADD DPVVTGGSPP GPLLHHHAAA ATTTGGRLLS DFGGGEESSS TTLGDSDDCT LFFLLSGAPP PPVVTGGSPP GPLLHHHAAA ATTTGGRLLS DFGGGEESSS TTLGDSDDCT --DP---DD- ---------- ---------- VDDAAAALLL EFFAAPPPCC AAAAAV-REE WWDPYVYDDP PPPTTTYPPS RRSQSDDNVV IDDTSSSPPP PPPPLPPPSS SSIPPLHKEE WWDPYVYDDP PPPTTTYPPS RRSQSDDNVV IDDTSSSPPP PPPPLPPPSS SSIPPLHKEE EEEVGGIRR- --------LH HLLCCGAIIE EGGGGHAALL LASAQQLADS SHHAAAAAAV EPPTDDSEED FFDDEPPPLK KAICC-RIIS SSSSSPNNEE EASKTTLLQI IRRESSSSSL EPPTDDSEED FFDDEPPPLK KAICC-RIIS SSSSSPNNEE EASKTTLLQI IRRESSSSSL AAIRRRVVAV VHHFFTTLLL PSVVPPPTAA AEEHAAL--- HHHHFFYACP PLKKHHHFTA DP-RRRVVAF FYYFFTELLL PSAASSSSSS STTEDDILLS KKKTLLNACP PSKKHHHLTA DP-RRRVVAF FYYFFTELLL PSAASSSSSS STTEDDILLS KKKTLLNACP PSKKHHHLTA NNQILFFFFH GGCDDHIIIS GLLLWPAAAA IQQAAALLRR RPPGGGGPPP -LLLITGPPP NNQILTTTTE KKSNNKVVVG GIIIWPAAAA LQQAAATTRR RTTSSSGKKP QIIIVSGAPP NNQILTTTTE KKSNNKVVVG GIIIWPAAAA LQQAAATTRR RTTSSSGKKP QIIIVSGAPP PPTGRDE-LG RRRAAAARSV VVVVFSFFRR GVVNSSLEVR RRWWIPPGEE EVAFNSSVVL LLGESPESLG RRRRRAAKVL LLLLFDFFII PII-PPILLN NNSSVPPDEE ELAVNFFMML LLGESPESLG RRRRRAAKVL LLLLFDFFII PII-PPILLN NNSSVPPDEE ELAVNFFMML QQQLLHHRLP PADDQ----- IIDCVVVVVV KFFTTVVVVE QQEDNKKGLR RTTLLYYAVV QQQLLYYKL- --DDETTTII VVDLAAAALL RVVTTLLLLE YYESNRRGAR RKKLLYYAVV QQQLLYYKL- --DDETTTII VVDLAAAALL RVVTTLLLLE YYESNRRGAR RKKLLYYAVV FFLAGGGANN NNAAMMMMA- EEECCDVVVV CGGEAAREER HEEEPLSRRW RRRRLTGLAV FFLNDDDERR RRVVRRRRVR RRRSSGIIII GPPETTHEER MEEEEKEQQW RRLLMEGFSV FFLNDDDERR RRVVRRRRVR RRRSSGIIII GPPETTHEER MEEEEKEQQW RRLLMEGFSV VPLLSSLLLQ AAARMMLLGG GFFSSGEGG- SVVEEAADDD DCLTLGHHHG RRSSAWWAAG VKLLNNVVVQ AAAKIILLWW WYYNNYSNNY IVVSSKKPPP PFISLANNND LLTSSWW--- VKLLNNVVVQ AAAKIILLWW WYYNNYSNNY IVVSSKKPPP PFISLANNND LLTSSWW--- DDGGGGNNNS SSSNNGSSDS SNSSSNGSGD GGGVCLLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ------MDTT TFFPQ----- ---WPPMDPP t5g66770.1 CTTGGGGNNL MMIAAQQQQE QQQQQQQQHH HHQDDDHQII IFFGNPPLSN NPPWPP-NTT t5g66770.2 CTTGGGGNNL MMIAAQQQQE QQQQQQQQHH HHQDDDHQII IFFGNPPLSN NPPWPP-NTT ASLLPPPAAV V--------- -----ADDDV VY-------- ---------- ---------- SGLFPPPFFV VTTGGGGPGF FFPLHATTTG GFRLLSGGGG GGGGGGFEDD EWWWMMTLII SGLFPPPFFV VTTGGGGPGF FFPLHATTTG GFRLLSGGGG GGGGGGFEDD EWWWMMTLII ---------- -------YDP ----GD---- ---------- -------AAA ALLLPAAAAA GGDSVVAADD DCCCTWWHDN YVIYGDPTTY PSRRLLVVPS DLLNRVVTSS SPPPLTTLLL GGDSVVAADD DCCCTWWHDN YVIYGDPTTY PSRRLLVVPS DLLNRVVTSS SPPPLTTLLL AFPPCCCDDA AAAAAAVVAR RREEEEVGR- ---------- ---LLLLMSS GIAGGGLAAA LWPPSSSLLS IIIPPPLLST KKDPPPTDED DDDDDFDLLP PPPAAAAYDD -IDSSSEAAK LWPPSSSLLS IIIPPPLLST KKDPPPTDED DDDDDFDLLP PPPAAAAYDD -IDSSSEAAK ALLASSSHAA AAALLLAAAA AVSSSSAASS GGVVHHHTAL SRL---SSPP VVAPDEFFLL KLLLIIIREE ESSVVVSSSS ELGGGGPPTT -EVFYYYTAL SNLNNNSSPP AATSSTLLII KLLLIIIREE ESSVVVSSSS ELGGGGPPTT -EVFYYYTAL SNLNNNSSPP AATSSTLLII -HHFFYYYEE EPPPAHFTTA NNAAILLEAA FHHGCCHVVV IDFFSSLQGL QQQWWPPALL SKTLLNNNDD DPPPAHLTTA NNAAILLEAA TEEKSSHIII VDFFGGIQGI QQQWWPPALL SKTLLNNNDD DPPPAHLTTA NNAAILLEAA TEEKSSHIII VDFFGGIQGI QQQWWPPALL LLALRRGGPP PFFF-RRIII IGPSD--LLL GLRDDLRSVR VRRSRVNNVR PWLQQIIAAP LLATRRGGPP PTTTQRRVVI IPASPPSLLL GNRDDFKVLD LNNDII--LN GSFRRVVDDP LLATRRGGPP PTTTQRRVVI IPASPPSLLL GNRDDFKVLD LNNDII--LN GSFRRVVDDP EAAVAAAFFS VLHRLLGDPP DDAAIIDDVV VDAAASSSRK FTIIEEQQDD DDNKGFDRRR EVVLAAAVVF MLYKLL---- DDTTVVDDAA ARKKKSSSNR VTGGEEYYSS SSNRGFNRRR EVVLAAAVVF MLYKLL---- DDTTVVDDAA ARKKKSSSNR VTGGEEYYSS SSNRGFNRRR FFEYYYFDDL DDDAGGNAAA EE--LRREEC DICGARRRRR HHEPPLLSSR RRDDRLRALV VVNFFFFEEL EEEGSERVVV EERRFRRRRS GLGKGHHHHR MMEEEKKEEQ QRVVLMNAFV VVNFFFFEEL EEEGSERVVV EERRFRRRRS GLGKGHHHHR MMEEEKKEEQ QRVVLMNAFV LLGGSALLLR QQARMMLLLV VGFFFGE--- HHVEEADCLT TLLLRRLFFA AWWWWAGGGG LLSSNAVVVS QQAKIILLLL LWYYYYSLYY SSVSSKPFIS SLLLLLLLLL SWWWW----- LLSSNAVVVS QQAKIILLLL LWYYYYSLYY SSVSSKPFIS SLLLLLLLLL SWWWW----- NNNNSSSNNS GSSSGGSDSN GGSGKKSARR DSSSVVCCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ------MTFF PFFQQ----- WWPMAAASGP PAA------- t5g66770.1 MCCDDGGNII AQQVVVKKQQ QQQHHDHIFF GIINNPLLLP WWP-SSLFSP DFQTGGDPPF t5g66770.2 MCCDDGGNII AQQVVVKKKQ QQQHHDHIFF GIINNPLLLP WWP-SSLFSP PFQTGGDPPF -------AAP VVVGY----- ---------- ---------- ---------- -------PPA FPNLLHHAAT GGGGFRRDFF GGTTTGGGGF FFSDWTTLSG GGGDSSVDGP PDDCCCTNND FPNLLHHAAT GGGGFRRDFF GGTTTGGGGF FFSDWTTLSG GGGDSSVDGP PDDCCCTNND ----AD---- ---------- ---------- VDDDDDALLE FFAAPDAAAA VLL-AREEEE YVYYPDPFFD DTYYYLSSSV VQDLLLNRVV IDDDDDSPPP PPTLPLPPPP LTTHSKEEPE YVYYPDPFFD DTYYYLSSSV VQDLLLNRVV IDDDDDSPPP PPTLPLPPPP LTTHSKEEPE AAAIR----- ----LVVVHL LLLLMMSSSC CCIEEAGGHH HALAAASAAA LLADHHHAAV NNNSEDDDEE PPPPLLLLKA IIIIYYDDDC CCISSDSSPP PNEAAASKKK LLLQRRRSEL NNNSEDDDEE PPPPLLLLKA IIIIYYDDDC CCISSDSSPP PNEAAASKKK LLLQRRRSEL ASSGIGGGRV VVTTTTTTAL LSRRLLF--P VPPPPTTDDA EH--YFEEEA PYKKKFFFAH DTT--EEERV FFTEEEEEAL LSNRLLSNNP ASSSSSSSSS TELSYLDDDA PYKKKFFFAH DTT--EEERV FFTEEEEEAL LSNRLLSNNP ASSSSSSSSS TELSYLDDDA PYKKKFFFAH TTNNAIAHHC DHHDFSSSMM QQGGQQQWWA LIILLLARPP GGPRRRIITG GGIGPPSSST TTNNAIAEES NKHDFGGGVV QQGGQQQWWA LLLLLLARTT GGKRRRVVSG GGIPAPSSSG TTNNAIAEES NKHDFGGGVV QQGGQQQWWA LLLLLLARTT GGKRRRVVSG GGIPAPSSSG GRR-LLDDVG GLLRLADAAA RSSVVRVVRF FSGGVAANSS SSLDDVPLLI APGAAAVVFN ESSSLLAATG GNNRLRDAAA KVVLLDLLNF FDPPITT-PP PPIHHLGFFV DPDVVVLLVN ESSSLLAATG GNNRLRDAAA KVVLLDLLNF FDPPITT-PP PPIHHLGFFV DPDVVVLLVN NVLQQHRLLP DDQAPPP-ID DVLDDCSVRR PPPPKTEEEA AHKTTGGFLL LLDDDRRTAA NMLQQYKLL- DDETPPPIVD DALRRLSLNN PPPPRTEEEV VLRVVGGFAA AANNNRRKAA NMLQQYKLL- DDETPPPIVD DALRRLSLNN PPPPRTEEEV VLRVVGGFAA AANNNRRKAA YYYYYSAADL DASGGAGNNM ME-AYYLLRE CIVVGGAA-E EEEREEPLWD RRLTTRRAAA FFFYYSAAEL EGRDDEERRR RERELLFFRR SLIIPKGGIE EEEREEEKWV LLMEENNAAA FFFYYSAAEL EGRDDEERRR RERELLFFRR SLIIPKGGIE EEEREEEKWV LLMEENNAAA ALLLSSAVLL LGSSNNNALR RRRLVGGLLF GGEEG--HHS SVEEEEEAAD GGLLWWRRPL AFFFEESVLL LSNNYYYAVS SKKLLWWNNY YYSSNLYSSI IVEESSSKKP GGILWWLLPL AFFFEESVLL LSNNYYYAVS SKKLLWWNNY YYSSNLYSSI IVEESSSKKP GGILWWLLPL FFFASSEAAA ADGGDNNNSV SGGSDNNSNK SSRSSVVCC LLLLSSR--- ---------- ---------- --------- LLLLSSR--- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------M TTPPQ----- ----WMMMPA t5g66770.1 MMYCTTDDDN LLMMMAIIIK QQQQEEQQQQ QHHHQQDDDH IIGGNPLSLL LPPPW---TS t5g66770.2 MMYCTTDDDN LLMMMAIIIK QQQQEEQQQQ QHHHQQDDDH IIGGNPLSLL LPPPW---TS AAAGGGLLDA AFLPPPPPAV ---------- ---------- ADGVGY---- ---------- SLLGGGLLSG GAFPPPPPQV TGGGDSNPGG FPFNNHHHHH ATGGGFRRLL LSDGGGGGGE SLLGGGLLSG GAFPPPPPQV TGGGDSNPGG FPFNNHHHHH ATGGGFRRLL LSDGGGGGGE ---------- ---------- ---YDPA--- -GGGGADDD- ---------- ----AAAAAA EDEWWMEETT LSSGSDDGGP DDTHDNDYVI YGGGGPDDDP PFFDDDTPPS SSNVTTTTSS EDEWWMEETT LSSGSDDGGP DDTHDNDYVI YGGGGPDDDP PFFDDDTPPS SSNVTTTTSS ALLPFAFCAA PPAAAAL--- AMRRREEEVA AGGGG----- ----VVMSAA IEAGGDHHAA SPPLPLWSSS PPSIPPTHEE SPTKKEDETN NDDDDFFFFD DEEPLLYDAR ISDSSDPPNA SPPLPLWSSS PPSIPPTHEE SPTKKEDETN NDDDDFFFFD DEEPLLYDAR ISDSSDPPNA ASSLLSSHLA AVVSSAAASI IRRVATALRL LFPPSSPPVP PPTTTAEL-- --YYFYYEEP ASSLLIIRVS SLLGGPPPT- -RRVATALRL LSPPSSPPAS SSSSSSTILL LSYYLNNDDP ASSLLIIRVS SLLGGPPPT- -RRVATALRL LSPPSSPPAS SSSSSSTILL LSYYLNNDDP PYKFFAAHHF FFFNQAEFHH GGGCHHIDDF LMMMLQWPLL LIIAAAAALR PPPGGPPFFI PYKFFAAHHL LLLNQAETEE KKKSKHVDDF IVVVIQWPLL LLLAAAAATR TTTSGKKTTV PYKFFAAHHL LLLNQAETEE KKKSKHVDDF IVVVIQWPLL LLLAAAAATR TTTSGKKTTV ITGGGPP--- RRGGRLLLAD DAAAARVVRV RFFSFFGGGV ANSPPWWMML IPGEAAAVVA VSGPPPLPSS IIGGRLLLRD DAAAAKLLDL NFFDFFPPPI T-PGGSSSSF VPDEVVVLLA VSGPPPLPSS IIGGRLLLRD DAAAAKLLDL NFFDFFPPPI T-PGGSSSSF VPDEVVVLLA AAFNSSLLLR DDDPDDDQAP P---ILCVVR KITTVVEQEE ANKTTGGFLD DRFFFTTTEE AAVNFFLLLK ----DDDETP PTTTVLLALN RVTTLLEYEE VNRVVGGFAN NRVVVKKKNN AAVNFFLLLK ----DDDETP PTTTVLLALN RVTTLLEYEE VNRVVGGFAN NRVVVKKKNN LLYAFFDDDS LLDAASSAGA AGGGNMMA-- AAYLQQREEE ICCCVCAAAR RHEPSRRWWW LLYAFFEEES LLEPPLLGDE EEEERRRVRR EELFGGRRRR ISSSIGGGGR RMEEEQQWWW LLYAFFEEES LLEPPLLGDE EEEERRRVRR EELFGGRRRR ISSSIGGGGR RMEEEQQWWW RRRLLTTTPP GGGAAALRMM LLFGGEGGGG G-HSEEEAGC LGWRRSAAAS SAWWEGDGGD RLLMMEEEKK SSSAAAVKII NNYYYSNNNN NYSIEESKGF IAWLLTLLLS SSWWR----- RLLMMEEEKK SSSAAAVKII NNYYYSNNNN NYSIEESKGF IAWLLTLLLS SSWWR----- NNNNSNNNVV SGSSGDSNNS SGGGSSNSSS AARDGCLLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----MDFFPF ---------- WWPPMPAAGG t5g66770.1 MMAYCTDSGG LLMIQQQVVV QQKQQQQQQQ HQDDHQFFGI PPLLSSLNPP WWPP-TSLGG t5g66770.2 MMAYCTDSGG LLMIQQQVVV KKKQQQQQQQ HQDDHQFFGI PPLLSSLNPP WWPP-TSLGG LDLPPPPAAV ---------- ----APDDDV G--------- ---------- ---------- LSFPDPPFQV GGDDPFNLDD HHHHATTTTG GSSDDFFGGG GGGGGTGGFF ESDEETLISS LSFPPPPFQV GGDDPFNLDD HHHHATTTTG GSSDDFFGGG GGGGGTGGFF ESDEETLISS ---------- -------YYY YPAA--GGDD DD-------- -------VVD DDALLFAAPC GGDSSSSVDD CCDDTWWHHH HPDDVYGGDD DDPFYPPPSS RLVPPLNIID DDSPPPTLPS GGDSSSSVDD CCDDTWWHHH HPDDVYGGDD DDPFYPPPSS RLVPPLNIID DDSPPPTLPS PAALL-MMMM RRREEEVAAI I--------- -HLLSAAAAA AEEAAGGGHH HASSSAALAA PSPTTHPPPP KKKEDPTNNS SDDDDDLLEP PKAIDAARRR RSSDDSSSPP PASSSKKLLL PSPTTHPPPP KKKEDPTNNS SDDDDDLLEP PKAIDAARRR RSSDDSSSPP PASSSKKLLL DSHHAAAALL AAAASAGIIV AAHHFTTAAL LSRRLF---A APTAEAFLL- YYHHHFYYEE QIRRESSSVV SSSEGD---V AAYYFTTAAL LSRRLSNNNT TSSSTDLIIS YYKTTLNNDD QIRRESSSVV SSSEGD---V AAYYFTTAAL LSRRLSNNNT TSSSTDLIIS YYKTTLNNDD AACCCCYYYY LKKKFFAHTN NNNAAIEDHH QQGGLLQQWW WAAAALLLLQ QQQQAALLLL AACCCCYYYY SKKKFFAHTN NNNAAIENKK QQGGIIQQWW WAAAALLLLQ QQQQAALLLL AACCCCYYYY SKKKFFAHTN NNNAAIENKK QQGGIIQQWW WAAAALLLLQ QQQQAALLLL AAGGPF-LRR RRITTGIGPS SSSPDDERDD DGGRLLAAAR RVRRFFSSFG NSLDEERWWQ AASGPTQIRR RRVSSGIPAS SSSLPPEIAA AGGRFFAAAD DLNNFFDDFP -PIHLLNSSR AASGPTQIRR RRVSSGIPAS SSSLPPEIAA AGGRFFAAAD DLNNFFDDFP -PIHLLNSSR PEAAVAFNSV HHRRGDDDPA AAADQP-IID VDCCCPPPKK IIIIFFTIIE QDDHHHHNNK PEVVLAVNFM YYKK------ ---DEPIVVD ARLLLPPPRR VVVVVVTGGE YSSLLLLNNR PEVVLAVNFM YYKK------ ---DEPIVVD ARLLLPPPRR VVVVVVTGGE YSSLLLLNNR TGFLDDFTTT EAFYYSFSAA ASGGAANNAA AEAYQREIII IVGEEGGA-- RRHPLLSRDR VGFANNVKKK NAQYYSFSPN GRSSEERRVV VEELGRRIIL LIPEEKKGII RRMEKKEQVL VGFANNVKKK NAQYYSFSPN GRSSEERRVV VEELGRRIIL LIPEEKKGII RRMEKKEQVL RRTRGLSSVL LLSSSNNAAL AMVVGSGEGG --HSSVDDGG GHHGRPPLSA SWAGGGGGDD LLENGFEEVL LLNNNYYAAV AILLWNYSNN YYSIIVPPGG GNNDLPPLTL SW-------- LLENGFEEVL LLNNNYYAAV AILLWNYSNN YYSIIVPPGG GNNDLPPLTL SW-------- NNNNSSSNNV GGGGSNNSGG SSNGGKKKSG GAARRRDSV ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- DTFFPF--WW MDPSSSSSLD ALPPPPAA-- t5g66770.1 MAAYCTDDGG NLLLQVVVVK QQKQEQQQQQ QIFFGILNWW -NTGGFFFLS GFPDPPFQTG t5g66770.2 MAAYCTDDGG NLLLQVVVVK KKKQEQQQQQ QIFFGILNWW -NTGGFFFLS GFPPPPFQTG ---------- ---------A ADDVGGGYY- ---------- ---------- ---------- GGGGDDPPPP PPNDDHHHHA ATTGGGGFFD FFGGGEFSSE WMMMTTLLII SSDSVVVAAA GGGGDDPPPP PPNDDHHHHA ATTGGGGFFD FFGGGEFSSE WMMMTTLLII SSDSVVVAAA -----DDPPP -----GGA-- --------VD AAAAALLPEE FAAFPAPPPA AAAAA-MMEV ADDGDDDNPP VIIYYGGPTT SSSVVVVQID TTSSSPPLPP PTTWPSPPPS SIIIPHPPET ADDGDDDNPP VIIYYGGPTT SSSVVVVQID TTSSSPPLPP PTTWPSPPPS SIIIPHPPET AAGIIRR--- --------LH LMMMSCGGAI IEGHAALLSS SAQLLLLDSH HAAAAASSGG NNDSSEEDDD DFLLEEEPLK AYYYDC--RI ISSPNNEESS SKTLLLLQIR RSEDDPTT-- NNDSSEEDDD DFLLEEEPLK AYYYDC--RI ISSPNNEESS SKTLLLLQIR RSEDDPTT-- IIIIGVAHFT TTALSSSRRR RLLFPSPPAA PTTDEEAAL- ----YHHHFF YEECCPPLLK ----EVAYFT EEALSSSNNR RLLSPSPPTT SSSSTTDDIL LLSSYKKTLL NDDCCPPSSK ----EVAYFT EEALSSSNNR RLLSPSPPTT SSSSTTDDIL LLSSYKKTLL NDDCCPPSSK KKFHHHFTAA AANNQALEFH HHGCDHVHVI DFFFFSGQWW PQAALLLLAL LLPPPPPPF- KKFHHHLTAA AANNQALETE EEKSNKIHIV DFFFFGGQWW PQAALLLLAT TTTKKKPPTQ KKFHHHLTAA AANNQALETE EEKSNKIHIV DFFFFGGQWW PQAALLLLAT TTTKKKPPTQ LRRRITGGGS SPEELDDVLA DAARRSVVVF FFFFGVAAAA ANNDDEEEEE VWWWWMMLLI IRRRVSGGGS SLEELAATNR DAAKKVLLLF FFFFPILLTT T--HHLLLLL LSSSSSSFFV IRRRVSGGGS SLEELAATNR DAAKKVLLLF FFFFPILLTT T--HHLLLLL LSSSSSSFFV APAVVFNVHR LGDPAAADQA AAP----IID VDASSRPIIF TTVVEEQEAA DGFLDRREAA DPVLLVNMYK L------DET TTPTTTTVVD ARKSSNPVVV TTLLEEYEVV SGFANRRNAA DPVLLVNMYK L------DET TTPTTTTVVD ARKSSNPVVV TTLLEEYEVV SGFANRRNAA AFYYYYSSDD DDSSLAASAG AAGMEE-ALL LEDVVCGGGG EEAAAAAA-E RRLLRWDDLR AQFFYYSSEE EESSLPPLGS EEEREEREFF FRGIIGPPPP EETTTGGGIE RRKKQWVVMN AQFFYYSSEE EESSLPPLGS EEEREEREFF FRGIIGPPPP EETTTGGGIE RRKKQWVVMN AAGGLLAAVL LNNNALLRQQ QARRLLVVVL FFGG----HV VVEDDGCTLL HRFFAASAEE AAGGFFSSVL LYYYAVVSQQ QAKKLLLLLN YYYNLLLLSV VVSPPGFSLL NLLLLLSSRR AAGGFFSSVL LYYYAVVSQQ QAKKLLLLLN YYYNLLLLSV VVSPPGFSLL NLLLLLSSRR AAAGGGGGGG DNNSNNSNSS SGGNSSGKSS SRDDGSCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---MTTTPQ- --WWPMDDPP AAAASSDALL t5g66770.1 ATDSNLMAIA AQQIIIKQKK QQQQQQQQQQ HHQHIIIGNL LLWWP-NNTT SSLLGGSGFF t5g66770.2 ATDSNLMAIA AQQIIIKKKK QQQQQQQQQQ HHQHIIIGNL LLWWP-NNTT SSLLGGSGFF LLAAAVV--- ---------- ---------- -DGVGY---- ---------- ---------- FFFQQVVGGG GGGDDDDSNN PGPPFNNLHH HTGGGFRRSS SDFFGGGGGE EEEESEEISG FFFQQVVGGG GGGDDDDSNN PGPPFNNLHH HTGGGFRRSS SDFFGGGGGE EEEESEEISG ---------- --PPAA---- ---------- --------VV DDAAAALLPP EEAAFPPPPP GGVAAAADGG DWNPDDYFTT YYPPRLVSSL LNNNRRVVII DDTSSSPPLL PPPTWPPPPP GGVAAAADGG DWNPDDYFTT YYPPRLVSSL LNNNRRVVII DDTSSSPPLL PPPTWPPPPP CAPPDAAAAL LL--AARREE EEEEEAR--- -------VHH LMSSCAIGDD HHALLAAASS SSPPLSIPPT TTHESSTKEE EEPPENEDDD DDLEEPPLKK AYDDCRISDD PPNEEAAASS SSPPLSIPPT TTHESSTKEE EEPPENEDDD DDLEEPPLKK AYDDCRISDD PPNEEAAASS SSQQLLADAA AAAVVVSAAS GGIRVAVHHH FTTALSPP-S SPVVVAAAAT AHAFLL-YYY SSTTLLLQSS EEELLLGDDT ---RVAFYYY FTEALSPPNS SPAAATTTTS SEDLIILYYY SSTTLLLQSS EEELLLGDDT ---RVAFYYY FTEALSPPNS SPAAATTTTS SEDLIILYYY HHFYEEEAAC PYYKFFAANN NQQILLLFHC CDHVVVIIIL MGGLLWWPLL IIAAAALLGG TTLNDDDAAC PYYKFFAANN NQQILLLTES SNHIIIVVVI VGGIIWWPLL LLAAAALLSS TTLNDDDAAC PYYKFFAANN NQQILLLTES SNHIIIVVVI VGGIIWWPLL LLAAAALLSS GGPP--IITG PTRDE----- -LDVRLDLVV RRVAAAAANN SLDDEVRPWW LQIAPAFFVL SSKPQQVVSG AGSPEPPPSS SLATRLDFLL IIILLTTT-- PIHHLLNGSS FRVDPAVVML SSKPQQVVSG AGSPEPPPSS SLATRLDFLL IIILLTTT-- PIHHLLNGSS FRVDPAVVML HHLLLLLGDP PPAADAAP-- IDDAVDDAAR RRPKKIFTII EAHKFFFFFL LLEEAFYYYY YYLLLLL--- ----DTTPTT VDDTARRKKN NNPRRVVTGG EVLRFFFFFA AANNAQFFFY YYLLLLL--- ----DTTPTT VDDTARRKKN NNPRRVVTGG EVLRFFFFFA AANNAQFFFY SSAVDDDDSL ASSGGGGAAG NAAEAYYLRE IIIIDIIVCG EGGGA-RERP LSRLTTAAGL SSAVEEEESL GRRDDDSEEE RVVEELLFRR IIIIGLLIGP EKKKGIHERE KELMEEAAGF SSAVEEEESL GRRDDDSEEE RVVEELLFRR IIIIGLLIGP EKKKGIHERE KELMEEAAGF SAVLLSSNNN ALQMMMMMMM GSGEEG---- EEEEEECLTL GGPPLLFSAA AAAGGDGGGN ESVLLNNYYY AVQIIIIIII WNYSSNLLYY EEEESSFISL AAPPLLLSS- ---------- ESVLLNNYYY AVQIIIIIII WNYSSNLLYY EEEESSFISL AAPPLLLSS- ---------- NNNSSNNSSS GGSSSDSNNN NNNSSKKKSS SAADGSCLL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- --------MM TPFFQ----M MMDDASSSLD t5g66770.1 AYYTDNAIII AQQQVVVIII KQQQEQQQQQ QQHHHHQDHH IGIINPSLP- --NNSGGGLS t5g66770.2 AYYTDNAIII AQQQVVVIII KQQQEQQQQQ QQHHHHQDHH IGIINPSLP- --NNSGGGLS DAAAGGFLPA V--------- -------PDD GGVGY----- ---------- ---------- SGGGSSAFDQ VGGGNGGPPN NLLDDHHTTT GGGGFRLSDG TGGGGGFEEM TTLISGGGDD SGGGSSAFPQ VGGGNGGPPN NLLDDHHTTT GGGGFRLSDG TGGGGGFEEM TTLISGGGDD ---------Y DPAA--GGGA D--------- ------VDDA AEEFAPCCCP DAAL--ARRE DAADDDDCDH DPDDIYGGGP DPPPFFDYPP VVPLNVIDDS SPPPLPSSSP LSPTHESTTE DAADDDDCDH DPDDIYGGGP DPPPFFDYPP VVPLNVIDDS SPPPLPSSSP LSPTHESTTE EEEVAAAGII R--------- ---LVVHLSS SSSSCCGAEA AGDDAAALAS SASHLLLAAA DPETNNNDSS EDDDDDFDLL EEPLLLKIDD DDDDCC-RSD DSDDNNNEAS SLIRVVVSSE DPETNNNDSS EDDDDDFDLL EEPLLLKIDD DDDDCC-RSD DSDDNNNEAS SLIRVVVSSE ASAAASSSGG VAAAVFTTLL LSRPP-SPPA PPDDAAAEH- YHHFEEEAKF TAIEAFHCDV EGDPPTTTEE VAAAFFTTLL LSRPPNSPPT SSSSSSSTEL YKTLDDDAKL TAIEATESNI EGDPPTTTEE VAAAFFTTLL LSRPPNSPPT SSSSSSSTEL YKTLDDDAKL TAIEATESNI HVDSLMMQQG GLQQWIIIQQ LLRPPPFFF- --ITGPSSGR D-LRDDVGGL RLAADDLRSV HIDGIVVQQG GIQQWLLLQQ LTRKPPTTTQ QQVSGASSES PSLIAATGGN RLRRDDFKVL HIDGIVVQQG GIQQWLLLQQ LTRKPPTTTQ QQVSGASSES PSLIAATGGN RLRRDDFKVL RFFFFRSVPW WMLQQIIAAP GAAVVFFNNV VLLHHLLGDP DDDQQAAP-- IIIVLLCCCV NFFFFIPLGS SSFRRVVDDP DVVLLVVNNM MLLYYLL--- DDDEETTPTI VVVALLLLLA NFFFFIPLGS SSFRRVVDDP DVVLLVVNNM MLLYYLL--- DDDEETTPTI VVVALLLLLA ASVRRRKIFV VIEEEEAADH HHHTTTGFLL FFFELFFFFD DSLLDAAAAA ASGGGGAGNN KSLNNNRVVL LGEEEEVVSL LLLVVVGFAA VVVNLQQFFE ESLLEPNGGG GRSSSSEERR KSLNNNRVVL LGEEEEVVSL LLLVVVGFAA VVVNLQQFFE ESLLEPNGGG GRSSSSEERR AE-YLLLRRE IICCDVVCEE GGGGGA--RE EEERHHPPSS DDRRLLLTTA AGGLSAVGSN VERLFFFRRR IISSGIIGEE KKKKKTIIRE EEERMMEEEE VVLLMMMEEA AGGFESVSNY VERLFFFRRR IISSGIIGEE KKKKKTIIRE EEERMMEEEE VVLLMMMEEA AGGFESVSNY NNNALLRRRR RMMLVGLFSS EGGG-HHHVE EDGGGCTLLL GGHRLSSSSW WWEEDGGGGD YYYAVVSSSK KIILLWNYNN SNNNYSSSVE EPGGGFSLLL AANLLTTSSW WWRR------ YYYAVVSSSK KIILLWNYNN SNNNYSSSVE EPGGGFSLLL AANLLTTSSW WWRR------ NNNNSSNNNV SSSGSSSNNG GGSSSGGGAA RRRDSSSCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ----DTFPPQ Q--------W WWWWPMMMMA t5g66770.1 MMMCCTSGGN LAAQQQQQVV KKQQEQQQQH QQDDQIFGGN NPPPLLSLNW WWWWP----S t5g66770.2 MMMCCTSGGN LAAQQQQQVV KKQQEQQQQH QQDDQIFGGN NPPPLLSLNW WWWWP----S ASSSLAAGGG FLLLPPPPAA AV-------- ---------- --------AA PPGGGGGGYY LGGFLGGSSS AFFFDPPPQQ QVTGGSSSNN DPPPGFFFFF PNNLDHHHAA TTGGGGGGFF LGGFLGGSSS AFFFPPPPQQ QVTGGSSSNN DPPPGFFFFF PNNLDHHHAA TTGGGGGGFF ---------- ---------- ------YYDP PAAA----AA A--------- ------VDAA FFGTGGGGGE EETTLGSVDG GCCTWWHHDP PDDDYYVYPP PPFTTYYPPL QQPSDRIDTS FFGTGGGGGE EETTLGSVDG GCCTWWHHDP PDDDYYVYPP PPFTTYYPPL QQPSDRIDTS ALPEEAAPPP CPDDDAAAAA VVLL--AAMR RREEEVGI-- -----LLLLM SCAGGGIAAA SPLPPPTPPP SPLLLIIIIP LLTTHESSPT TKEEETDSDF LLPPPLLAIY DCA---IDDD SPLPPPTPPP SPLLLIIIIP LLTTHESSPT TKEEETDSDF LLPPPLLAIY DCA---IDDD GDDAASSAAA AQQAADDDAV VVSSSASGII GGRAAVVHHH FFTTALSRRR RRLL-SPPPT SDDNNSSKKK KTTLLQQQEL LLGGGDT--- EERAAFFYYY FFTEALSNNN NRLLNSSSSS SDDNNSSKKK KTTLLQQQEL LLGGGDT--- EERAAFFYYY FFTEALSNNN NRLLNSSSSS TDDAEEHH-- HHYYYEECPY LKHHFNQQAI LLLEHHGDHH HVVHVDFFSL MQQGGLLWPI SSSSTTEESS TTNNNDDCPY SKHHLNQQAI LLLEEEKNKK KIIHIDFFGI VQQGGIIWPL SSSSTTEESS TTNNNDDCPY SKHHLNQQAI LLLEEEKNKK KIIHIDFFGI VQQGGIIWPL IQQAAAALRP F---LTPPSP PGDEEE-LRD VGGRAADDDL ARRRVRVSSS FRGGASLLLD LQQAAAATRK TQQQISAPSL LEPEEEPLIA TGGRRRDDDF AKKKLDLDDD FIPPLPIIIH LQQAAAATRK TQQQISAPSL LEPEEEPLIA TGGRRRDDDF AKKKLDLDDD FIPPLPIIIH EEVVVRPWWM MLLPEVANSV VLQQQLRRLG GDAQPP-DDA VDDCCCAKEE EEADHNKKKT LLLLLNGSSS SFFPELANFM MLQQQLKKL- ---EPPIDDT ARRLLLKREE EEVSLNRRRV LLLLLNGSSS SFFPELANFM MLQQQLKKL- ---EPPIDDT ARRLLLKREE EEVSLNRRRV FFFLRRFFTT EALFYYSAVS DAASSSGGGG GGGAAAAGAM -AAYLLECDI CCGEG-RRRR FFFARRVVKK NALQFFSAVS EPNLLRDDSS SSSEEEEEVR REELFFRSGL GGPEKIHHHH FFFARRVVKK NALQFFSAVS EPNLLRDDSS SSSEEEEEVR REELFFRSGL GGPEKIHHHH PPRRLRAAAG LSSVVPGGGS SSNALRQQRR VVLGGEG-SV VEALLTLWWG PPLFASSSWW EEQQMNAAAG FEEVVKSSSN NNYAVSQQKK LLNYYSNYIV VSKIISLWWD PPLLLSSSWW EEQQMNAAAG FEEVVKSSSN NNYAVSQQKK LLNYYSNYIV VSKIISLWWD PPLLLSSSWW DGGGGNNNNN NSNNSNNVSS SGSNSGSNSS GARRDGSSC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---------- -------FPP PFQQ------ t5g66770.1 MAAYMMCTTD GGNNLLIAAQ QQQVIIQKQQ QQEEQQQQHH HHHHHQQFGG GINNPPSLLN t5g66770.2 MAAYMMCTTD GGNNLLIAAQ QQQVIIKKQQ QQEEQQQQHH HHHHHQQFGG GINNPPSLLN -MDDDDAAAA ASLFFPPPAV VV-------- ---------- -----PDDVV YY-------- N-NNNNSSLL LFLAAPDPQV VVTTGGGDDN NPFFPNNDDH HHHHHTTTGG FFFFGGGGTT N-NNNNSSLL LFLAAPPPQV VVTTGGGDDN NPFFPNNDDH HHHHHTTTGG FFFFGGGGTT ---------- ---------- ---------- ---YPPPPA- ---D------ -------AAA TGGEESEEME EETTSSSGDD DSAADDGDDD DCWHNPPPDY YVIDPFFSSV VSSDDLLTTT TGGEESEEME EETTSSSGDD DSAADDGDDD DCWHNPPPDY YVIDPFFSSV VSSDDLLTTT AAALPPEAAA AAFFCCCAPD AAVVL----- RRRREEEVVA AGGR------ -----LLLLL SSSPLLPPTT LLWWSSSSPL IPLLTEEEEE TTKKEDDTTN NDDEDDDDDD FFFEPLLLII SSSPLLPPTT LLWWSSSSPL IPLLTEEEEE TTKKEDDTTN NDDEDDDDDD FFFEPLLLII LMCCGGIGHA AASSAAAQDS HHHAAAALLA GIGVTTLLSR RRPPP-PAPP PTTTTDDAAA IYCC--ISPN AASSKKKTQI RRREEESVVP --EFTTLLSR RRPPPNPTSS SSSSSSSSDD IYCC--ISPN AASSKKKTQI RRREEESVVP --EFTTLLSR RRPPPNPTSS SSSSSSSSDD FFLL--HHHH HFFYCLLKAH HHFNQALLEA FHVVVDSLMQ QQGGLQQWPP PLLIIQRPGG LLIILSKKKK TLLNCSSKAH HHLNQALLEA THIIIDGIVQ QQGGIQQWPP PLLLLQRTGG LLIILSKKKK TLLNCSSKAH HHLNQALLEA THIIIDGIVQ QQGGIQQWPP PLLLLQRTGG GGPF-LRITP PPPSSPTGGG RDEE---LLD VVGLLDDAAR RSSGVVVAAN NSSSLLLRRP GGKTQIRVSA AAPSSLGEEE SPEEPSSLLA TTGNLDDAAK KVDPIIITT- -PPPIIINNG GGKTQIRVSA AAPSSLGEEE SPEEPSSLLA TTGNLDDAAK KVDPIIITT- -PPPIIINNG WWQAAGGGEE AAVAAFSLQH LDDAAP-AAL LVVASVRPIF TTVIIEQQEE HHHNKKGLLD SSRDDDDDEE VVLAAVFLQY LDDTTPITTL LAAKSLNPVV TTLGGEYYEE LLLNRRGAAN SSRDDDDDEE VVLAAVFLQY LDDTTPITTL LAAKSLNPVV TTLGGEYYEE LLLNRRGAAN FFEEEAAYYS AAVVFAASGG NNME--AALQ QREEIICCDI VVCGGGEAAR PSSSRDRLLR VVNNNAAFYS AAVVFPNRSE RRRERREEFG GRRRIISSGL IIGPPPETGH EEEEQVLMMN VVNNNAAFYS AAVVFPNRSE RRRERREEFG GRRRIISSGL IIGPPPETGH EEEEQVLMMN GLPLLGGGSN ALRQQQQQAA RRRMLLVLSG G-VEACLLTL LLGGHHGPSA ASAAWWEEAA GFKLLSSSNY AVSQQQQQAA KKKILLLNNY YYVSKFIISL LLAANNDPTL LSSSWWRR-- GFKLLSSSNY AVSQQQQQAA KKKILLLNNY YYVSKFIISL LLAANNDPTL LSSSWWRR-- AGGGGGNNNN NNNNSNGSDD DSSSNNNSSK KGGRGSSCC ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- 3 639 01g45860.1 ---------- ---------- ---------- ---FPFF--- ----PMDASS SSGGDAGGFL t5g66770.1 MAAYTDGGNN AAQKQQKKQE EEQQQQQQQH QDDFGIIPLS LNPPP-NLGF FFGGSGSSAF t5g66770.2 MAAYTDGGNN AAQKKKKKQE EEQQQQQQQH QDDFGIIPLS LNPPP-NLGF FFGGSGSSAF LPPAVVVV-- ---------- ---PPDDGY- ---------- ---------- ---------- FDPQVVVVGG GGSNNPPFFF HHHTTTTGFD DFGGGTGGGG EFFFEESDWW MMEEETTTSS FPPQVVVVGG GGSNNPPFFF HHHTTTTGFD DFGGGTGGGG EFFFEESDWW MMEEETTTSS ---------- -----PPAA- ----GA---- ---------- ----VDAAAL LLLPFFAAAA GGGGDDSDCC CCDWWNNDDI IYYYGPPPFT YSSLSQSSDL LVVVIDTSSP PPPLPPPPTT GGGGDDSDCC CCDWWNNDDI IYYYGPPPFT YSSLSQSSDL LVVVIDTSSP PPPLPPPPTT APPPPCAAAA AAAAAA-AME EEEEVVVAAA GI-------L LVHHMMMMCA AIIIEAGGGA LPPPPSSSSS PPPPPPHSPE DEEETTTNNN DSDDFFDLPL LLKKYYYYCA RIIISDSSSN LPPPPSSSSS PPPPPPHSPE DEEETTTNNN DSDDFFDLPL LLKKYYYYCA RIIISDSSSN AASAAQLASH HAAAAAAVVV AAASGIIRVV VAHFTTLRRR LFPPPPP-SV VAAPTTTAAA NASKKTLLIR REESSEELLL DDPT---RVV VAYFTTLRRR LSPPPPPNSA ATTSSSSSDD NASKKTLLIR REESSEELLL DDPT---RVV VAYFTTLRRR LSPPPPPNSA ATTSSSSSDD -HHYYEEEEC CLLFFAAHHF TAAADDHVVV IFFSSLMQQG GLWAIQAAAL ALRRPPGGPP SKTNNDDDDC CSSFFAAHHL TAAANNKIII VFFGGIVQQG GIWALQAAAL ATRRTTSGPP SKTNNDDDDC CSSFFAAHHL TAAANNKIII VFFGGIVQQG GIWALQAAAL ATRRTTSGPP FFFF-RRIPP PSSSGEEE-- -LRRDVLAAA AAARSVRVVR FFSFRRGGVV ANNSDDEVVW TTTTQRRIAP PSSSEEEEPP SLIIATLRAA AAAKVLDLLN FFDFIIPPII L--PHHLLLS TTTTQRRIAP PSSSEEEEPP SLIIATLRAA AAAKVLDLLN FFDFIIPPII L--PHHLLLS LIEAVAANVV LQQLHHHLLP PDDQQQA--- VVLDDDCVSV VRRKIIFIEE EDDDHNKKTT FVEVLAANMM LQQLYYYLL- -DDEEETTII AALRRRLASL LNNRVVVGEE ESSSLNRRVV FVEVLAANMM LQQLYYYLL- -DDEEETTII AALRRRLASL LNNRVVVGEE ESSSLNRRVV LAALLFSSVF SSSLASAGGA GNNAEAYQII DIVCGGARRE SRRRRWDDDR RLTAGGGLSA AAALLQSSVF SSSLPLGDSE ERRVEELGII GLIGPKGHRE EQQQQWVVVL LMEAGGGFES AAALLQSSVF SSSLPLGDSE ERRVEELGII GLIGPKGHRE EQQQQWVVVL LMEAGGGFES APPLAARAAA RLVVGFFSSS SSGVVEEEAD GCTTLWGRPF FSAASSAWEA GGGNNNNNSS SKKLAASAAA KLLLWYYNNN NNNVVEESKP GFSSLWDLPL LTLLSSSWR- ---------- SKKLAASAAA KLLLWYYNNN NNNVVEESKP GFSSLWDLPL LTLLSSSWR- ---------- SSSNNVVVSG SSSSNSSGSS NKSSSAARRR RGSSVVVCL ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- From jason.stajich at duke.edu Thu Aug 4 16:16:12 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Aug 4 16:07:55 2005 Subject: [Bioperl-l] dividing seqboot outfiles In-Reply-To: <42F23450.70209@cirad.fr> References: <42F23450.70209@cirad.fr> Message-ID: How about Bio::AlignIO? my $aln = Bio::AlignIO (-format => 'phylip); On Aug 4, 2005, at 11:29 AM, matthieu wrote: > Hello, > I'm trying to divide seqboot outfiles containing 100 > multialignments in , for example, 10 files of 10 multialignments. I > did'nt find any parser for this. > I'm thinking about identifying the first charaters of the seqboot > outfiles (ex :" 3 639 " in my example) to recognize each > multialignment "blocks" but I didn't manage to do this... > In join my frist code and an example of seqboot outfile. > Thanks > > > Matthieu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From james.wasmuth at ed.ac.uk Fri Aug 5 06:10:48 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Fri Aug 5 06:19:32 2005 Subject: [Bioperl-l] bl2seq In-Reply-To: <655276655c36.655c36655276@emich.edu> References: <655276655c36.655c36655276@emich.edu> Message-ID: <42F33B28.4020101@ed.ac.uk> Hi Usha, what happens if you type 'bl2seq' on the command line? Usha Rani Reddi wrote: >Hi, >I tried to run local bl2seq by installing Bioperl on Linux machine. >When I tried to align 2 sequences using bl2seq I got an error message >that says "could not find path to bl2seq". After getting the error >message I did set the environmental variables(path) and tried again I >got the same error message. Please help me with this. >Thanks >Usha > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- "You have made your way from worm to man, and much in you is still worm." Friedrich Nietzsche, Thus Spoke Zarathustra Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org/~james Edinburgh | EH9 3JT | UK | From hlapp at gnf.org Fri Aug 5 11:18:27 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Aug 5 11:10:07 2005 Subject: [Bioperl-l] Re: all tests pass [was Re: Fixing bioperl] [was Re: Analysis features] In-Reply-To: <2bf4b9070ab5bb61b34e15d3ae611044@duke.edu> References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> <42E909E3.2030102@infobiogen.fr> <1122570166.3288.10.camel@localhost.localdomain> <1122650232.10455.31.camel@localhost.localdomain> <51a02b5bd508f35301ee3c847b104895@gnf.org> <1122925500.3857.40.camel@localhost.localdomain> <2aae0a4129cb2c7407df5834b94f41aa@gnf.org> <2bf4b9070ab5bb61b34e15d3ae611044@duke.edu> Message-ID: <08043cb9048bd811f56f265e20fed521@gnf.org> On Aug 1, 2005, at 7:31 PM, Jason Stajich wrote: > I'm getting all tests passing for me on OSX and a few different linux > machines with different complements of aux modules installed. I fixed > some minor things that were breaking. > I can actually confirm this. All tests passed last night. Cool, finally, thanks Jason for taking the time, this is helpful. -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Sat Aug 6 01:02:30 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Sat Aug 6 00:51:01 2005 Subject: [Bioperl-l] search2gff.PLS Message-ID: <4e456734cc6e4972ea642f815f641a2d@gnf.org> I was looking for an existing tool to convert a SearchIO report (multi-report BLASTN in my case) to GFF3 and found scripts/utilities/search2gff.PLS. Is this the tool I'm looking for or did miss a better suited one elsewhere in either bioperl or gbrowse/gmod? If this is the tool, why is it suitable only for protein matches? (The doc says so. I used it for BLASTN report and didn't find any nucleotide-related flaws.) Or is this old documentation that should be fixed? Also, the container match feature comes out with start and end being equal. I'm not sure whether I was doing something wrong, but the following fixes this. 146,148c149,151 < $max{$type} = $proxyfor->start unless defined $max{$type} && $max{$type} > $proxyfor->end; < $min{$other} = $otherf->start unless defined $min{$type} && $min{$type} < $otherf->start; < $max{$other} = $otherf->start unless defined $max{$type} && $max{$type} > $otherf->end; --- > $max{$type} = $proxyfor->end unless defined $max{$type} && $max{$type} > $proxyfor->end; > $min{$other} = $otherf->start unless defined $min{$other} && $min{$other} < $otherf->start; > $max{$other} = $otherf->end unless defined $max{$other} && $max{$other} > $otherf->end; Since I don't understand 100% what should happen for a correct GFF3 match container, I just wanted to make sure that this is indeed a fix and not introducing a bug before I commit it. Was anybody using this tool with the -m option? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From s0460205 at sms.ed.ac.uk Mon Aug 8 04:17:51 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Mon Aug 8 04:10:01 2005 Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records Message-ID: <1123489071.42f7152f3690e@sms.ed.ac.uk> Hi folks, I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db 1.2) set up containing protein and gene data. However, when I load gene sequence records (EMBL or Genbank) using: perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk -dbuser s0460205 -dbpass password -format embl /home/s0460205/file_name.txt from bioperl-db it does not pick up any dbxrefs i.e. there is no dbxref_id for MEDLINE etc. Has anyone else come across this rpoblem and is ther a fix? Cheers, Stephen From hotafin at gmail.com Mon Aug 8 09:28:16 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Mon Aug 8 09:19:30 2005 Subject: [Bioperl-l] local electric charge/hydrophobicity/flexibility of proteins Message-ID: Sorry for the OT, but does anyone know a program (command line), that can calculate local electric charges (electric charge distribution/density), hydrophobicity, protein felxibility, based on pdb structures? Or does anyone know where may I find corresponding algorithms? From lstein at cshl.edu Mon Aug 8 15:02:09 2005 From: lstein at cshl.edu (lstein@cshl.edu) Date: Mon Aug 8 14:53:07 2005 Subject: [Bioperl-l] Bioperl version string not picked up by MakeMaker Message-ID: <200508081902.j78J29dj005819@presto.lsjs.org> Hi, Sadly, the Bio::Root::Version system does not play nicely with MakeMaker. I have a WriteMakefile() routine in the gbrowse Makefile.PL which looks like this: WriteMakefile( 'NAME' => 'Generic-Genome-Browser', 'VERSION' => $VERSION, 'PREREQ_PM' => { Bio::Perl => 1.5, GD => 2.07, IO::String => 0, Text::Shellwords => 1.0, }, # e.g., Module::Name => 1.1 ...); But when I run perl Makefile.PL I get: Warning: prerequisite Bio::Perl 1.5 not found. We have unknown version. I have added a "use Bio::Root::Version '$VERSION'" to Bio/Perl.pm and this seems to fix the problem, but someone who understands MakeMaker better had better confirm that this is the right solution. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From amackey at pcbi.upenn.edu Mon Aug 8 15:12:41 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Aug 8 15:04:00 2005 Subject: [Bioperl-l] Bioperl version string not picked up by MakeMaker In-Reply-To: <200508081902.j78J29dj005819@presto.lsjs.org> References: <200508081902.j78J29dj005819@presto.lsjs.org> Message-ID: One backward-compatible alternative is to direct your PREREQ_PM to Bio::Root::Version instead of Bio::Perl, but I like your forward- compatible solution as well. -Aaron On Aug 8, 2005, at 3:02 PM, wrote: > Hi, > > Sadly, the Bio::Root::Version system does not play nicely with > MakeMaker. I have a WriteMakefile() routine in the gbrowse Makefile.PL > which looks like this: > > WriteMakefile( > 'NAME' => 'Generic-Genome-Browser', > 'VERSION' => $VERSION, > 'PREREQ_PM' => { > Bio::Perl => 1.5, > GD => 2.07, > IO::String => 0, > Text::Shellwords => 1.0, > }, # e.g., Module::Name => 1.1 > ...); > > > But when I run perl Makefile.PL I get: > > Warning: prerequisite Bio::Perl 1.5 not found. We have unknown > version. > > I have added a "use Bio::Root::Version '$VERSION'" to Bio/Perl.pm and > this seems to fix the problem, but someone who understands MakeMaker > better had better confirm that this is the right solution. > > Lincoln > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse@cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From jason.stajich at duke.edu Mon Aug 8 15:24:54 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 8 15:17:57 2005 Subject: [Bioperl-l] Bioperl version string not picked up by MakeMaker In-Reply-To: <200508081902.j78J29dj005819@presto.lsjs.org> References: <200508081902.j78J29dj005819@presto.lsjs.org> Message-ID: <3CA4E65C-60B5-4342-BE56-E5C445F727A6@duke.edu> VERSION_FROM tries to parse the file for the version information, I am pretty sure the Bio::Root::Version stuff is for run-time initialization of $VERSION variable for each package. I am guessing that is what PREREQ_PM is also doing to determine a version for the dependancy. If you make it depend on Bio::Root::Version and it will be able to parse it but I assume Bio::Perl is a clearer dependancy. Otherwise what you've done for Bio::Perl makes and including the $VERSION variable makes the most sense. -jason On Aug 8, 2005, at 3:02 PM, wrote: > Hi, > > Sadly, the Bio::Root::Version system does not play nicely with > MakeMaker. I have a WriteMakefile() routine in the gbrowse Makefile.PL > which looks like this: > > WriteMakefile( > 'NAME' => 'Generic-Genome-Browser', > 'VERSION' => $VERSION, > 'PREREQ_PM' => { > Bio::Perl => 1.5, > GD => 2.07, > IO::String => 0, > Text::Shellwords => 1.0, > }, # e.g., Module::Name => 1.1 > ...); > > > But when I run perl Makefile.PL I get: > > Warning: prerequisite Bio::Perl 1.5 not found. We have unknown > version. > > I have added a "use Bio::Root::Version '$VERSION'" to Bio/Perl.pm and > this seems to fix the problem, but someone who understands MakeMaker > better had better confirm that this is the right solution. > > Lincoln > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse@cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gnf.org Mon Aug 8 16:04:08 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Aug 8 15:52:41 2005 Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records In-Reply-To: <1123489071.42f7152f3690e@sms.ed.ac.uk> References: <1123489071.42f7152f3690e@sms.ed.ac.uk> Message-ID: <23040211b0fc735fcf7c97fc97770473@gnf.org> Are you referring to references and their PMID? These you would find in the Reference table, which has a foreign key to dbxref, which would only store the PUBMED or MEDLINE ID (not both at this time). Can you given an example accession that's giving you grief? -hilmar On Aug 8, 2005, at 1:17 AM, SG Edwards wrote: > Hi folks, > > I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db > 1.2) set up > containing protein and gene data. However, when I load gene sequence > records > (EMBL or Genbank) using: > > perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk -dbuser > s0460205 > -dbpass password -format embl /home/s0460205/file_name.txt > > from bioperl-db it does not pick up any dbxrefs i.e. there is no > dbxref_id for > MEDLINE etc. > > Has anyone else come across this rpoblem and is ther a fix? > > Cheers, > > Stephen > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From markus.riester at student.uni-tuebingen.de Mon Aug 8 13:12:36 2005 From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de) Date: Mon Aug 8 17:21:27 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files Message-ID: Hi, I've made some modules for searching for patterns in fasta files with different (really fast) backends like agrep and vmatch. I don't think you want to include this in standard bioperl. But we think it is useful code and we'd like to share it on cpan. The main reason for this email is a discussion about the right namespace for this module. What do you think? Markus (hope the attachment reaches the mailinglist, if not, please send me a mail if you are interested in this code) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/x-gzip Size: 26854 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050808/69bfd899/attachment-0001.bin From iluminati at earthlink.net Mon Aug 8 22:01:27 2005 From: iluminati at earthlink.net (iluminati@earthlink.net) Date: Mon Aug 8 21:51:55 2005 Subject: [Bioperl-l] Question about handling ontology files Message-ID: <42F80E77.2000205@earthlink.net> Here's my situation. I have a bunch of ontologies downloaded from a batch run on SOURCE. What I want to be able to do is parse these files so I can count the different numbers of instances of terms within all 3 sets of ontological descriptions (biological process, cellular component and molecular function). Is there something in Perl or another program that could help me out with this situation? Any information that you have would be useful to me. Thanks Todd Graham From sdavis2 at mail.nih.gov Tue Aug 9 07:05:33 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Aug 9 06:56:02 2005 Subject: [Bioperl-l] Question about handling ontology files In-Reply-To: <42F80E77.2000205@earthlink.net> Message-ID: On 8/8/05 10:01 PM, "iluminati@earthlink.net" wrote: > Here's my situation. I have a bunch of ontologies downloaded from a > batch run on SOURCE. What I want to be able to do is parse these files > so I can count the different numbers of instances of terms within all 3 > sets of ontological descriptions (biological process, cellular component > and molecular function). Is there something in Perl or another program > that could help me out with this situation? Any information that you > have would be useful to me. Thanks I think you will have to parse the SOURCE files yourself. After that is done, there are several options including go-perl (from http://www.geneontology.org), Bioperl (Bio::OntologyIO and relatives), and GO-TermFinder (on CPAN). I'm not sure which is going to be the best option for you. If you are comfortable with RDBMs, you could download the tables from the GO mysql database and do lookups yourself using DBI. Hope this helps, Sean From sdavis2 at mail.nih.gov Tue Aug 9 07:38:53 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Aug 9 07:30:26 2005 Subject: [Bioperl-l] Question about handling ontology files In-Reply-To: Message-ID: On 8/9/05 7:05 AM, "Davis, Sean (NIH/NHGRI)" wrote: > On 8/8/05 10:01 PM, "iluminati@earthlink.net" > wrote: > >> Here's my situation. I have a bunch of ontologies downloaded from a >> batch run on SOURCE. What I want to be able to do is parse these files >> so I can count the different numbers of instances of terms within all 3 >> sets of ontological descriptions (biological process, cellular component >> and molecular function). Is there something in Perl or another program >> that could help me out with this situation? Any information that you >> have would be useful to me. Thanks > > I think you will have to parse the SOURCE files yourself. After that is > done, there are several options including go-perl (from > http://www.geneontology.org), Bioperl (Bio::OntologyIO and relatives), and > GO-TermFinder (on CPAN). I'm not sure which is going to be the best option > for you. > > If you are comfortable with RDBMs, you could download the tables from the GO > mysql database and do lookups yourself using DBI. I forgot to mention the EASIEST way to do this. If you have SOURCE output, you can include locuslink IDs that you can use to put into various online or standalone ontology analysis packages. See this link for a recent review: http://bioinformatics.oxfordjournals.org/cgi/reprint/bti565v1 From amackey at pcbi.upenn.edu Tue Aug 9 09:13:14 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Aug 9 09:03:38 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files In-Reply-To: References: Message-ID: <4C10E798-EC55-4242-B573-7352B1F4FB55@pcbi.upenn.edu> Out of curiosity, are your patterns allowed to cross newlines embedded in the FASTA file? This is the typical problem with using grep/agrep directly with sequence files ... -Aaron On Aug 8, 2005, at 1:12 PM, wrote: > > Hi, > > I've made some modules for searching for patterns in fasta files with > different (really fast) backends like agrep and vmatch. I don't > think you > want to include this in standard bioperl. But we think it is useful > code and > we'd like to share it on cpan. The main reason for this email is a > discussion > about the right namespace for this module. What do you think? > > Markus > > (hope the attachment reaches the mailinglist, if not, please send > me a mail if > you are interested in this code) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From markus.riester at student.uni-tuebingen.de Tue Aug 9 15:32:09 2005 From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de) Date: Tue Aug 9 09:25:48 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files In-Reply-To: <4C10E798-EC55-4242-B573-7352B1F4FB55@pcbi.upenn.edu> References: , Message-ID: with a cheap trick, yes, split the fasta files in two files. ids in one file, sequences -one per line- in the second. this should be ok for cdna/protein fastafiles (but I am currently writing tests-maybe some serious problems with the chars per line limitations show up-but I did look good in some first tests.) we don't use agrep anymore, because vmatch is really, really good. only with many mismatches and short query sequences, agrep seems to be a bit faster. markus "Aaron J. Mackey" schrieb: > Out of curiosity, are your patterns allowed to cross newlines > embedded in the FASTA file? This is the typical problem with using > grep/agrep directly with sequence files ... > > -Aaron > > On Aug 8, 2005, at 1:12 PM, > wrote: > > > > > Hi, > > > > I've made some modules for searching for patterns in fasta files with > > different (really fast) backends like agrep and vmatch. I don't > > think you > > want to include this in standard bioperl. But we think it is useful > > code and > > we'd like to share it on cpan. The main reason for this email is a > > discussion > > about the right namespace for this module. What do you think? > > > > Markus > > > > (hope the attachment reaches the mailinglist, if not, please send > > me a mail if > > you are interested in this code) > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: amackey@pcbi.upenn.edu > office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 > > -- From reche at research.dfci.harvard.edu Tue Aug 9 11:48:47 2005 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Tue Aug 9 11:38:52 2005 Subject: [Bioperl-l] embl sequences 2 fasta In-Reply-To: <20050630215553.GB13422@bioinfo.ucr.edu> References: <434AF352F9D03C4C896782B8CC78BC7687F264@VADER.oriongenomics.com> <20050630215553.GB13422@bioinfo.ucr.edu> Message-ID: Hi I am interesting in finding all sequences from embl matching a given feature in sub cellular location and then create a single file for each of them in fasta format. Any help will be appreciated. Regards, pedro On Jun 30, 2005, at 5:55 PM, Josh Lauricha wrote: > On Thu 06/30/05 16:48, Joseph Bedell wrote: >> You can calculate the score given the bit score (from the tabular >> output) and Lambda (calculated from the matrix). The equation is >> Score = >> (Bits)/(Lambda in bits). >> >> Lambda is only dependent upon the matrix. Did you use NCBI-blast or >> WU-BLAST? Which flavor of blast (blastn, blastp, etc)? In any case, >> you >> can just run a single blast and look at the stats at the bottom of the >> report to get the value of lambda. For example, a default NCBI-blastn >> (+1/-3) search has a lambda of 1.37 >> >> ============================ >> Lambda K H >> 1.37 0.711 1.31 >> >> Gapped >> Lambda K H >> 1.37 0.711 1.31 >> =============================== >> >> But, what is difficult to discover is this lambda is in NATS. To >> convert >> it to bits, divide it by the natural log of 2, or in perl: >> >> perl -e 'print 1.37/log(2),"\n"' >> 1.97649220601788 >> >> So, now you can take all of your bit scores divided by >> 1.97649220601788 >> to get the Score. >> >> HTH, >> Joey > > Cool, thanks. That'll save me a bunch of time ;) This was NCBI blastp, > so I've already got it calculated ;) > > Thanks. > > -- > > ------------------------------------------------------ > | Josh Lauricha | Ford, you're turning | > | laurichj@bioinfo.ucr.edu | into a penguin. Stop | > | Bioinformatics, UCR | it | > |----------------------------------------------------| > | OpenPG: | > | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | > |----------------------------------------------------| > | Geek Code: Version 3.12 | > | GAT/CS$/IT$ d+ s-: a-->--- C++++$ UL++++$ P++ L++++| > | $E--- W+ N o? K? w--(---) O? M+(++) V? PS++ PE-(--)| > | Y+ PGP+++ t--- 5+++ X+ R tv DI++ D--- G++ | > | e++ h- r++ z? | > |----------------------------------------------------| > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From iain.m.wallace at gmail.com Tue Aug 9 07:15:46 2005 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Tue Aug 9 11:42:08 2005 Subject: [Bioperl-l] [Bioperl -l] Problem reading EMBL format file Message-ID: <8cff3eb805080904155c0682b9@mail.gmail.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: COAT_SBMV.M23021.embl Type: application/octet-stream Size: 10333 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/4b578b28/COAT_SBMV.M23021.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: COAT_SBMV.AAA46567.cds Type: application/octet-stream Size: 1901 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/4b578b28/COAT_SBMV.AAA46567.obj From s0460205 at sms.ed.ac.uk Tue Aug 9 12:21:07 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Tue Aug 9 12:51:10 2005 Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records In-Reply-To: <23040211b0fc735fcf7c97fc97770473@gnf.org> References: <1123489071.42f7152f3690e@sms.ed.ac.uk> <23040211b0fc735fcf7c97fc97770473@gnf.org> Message-ID: <1123604467.42f8d7f396648@sms.ed.ac.uk> Hi, My installation does not pick up ANY dbxrefs for gene records e.g. Pubmed, MEDLINE(either EMBL or Genbank formats). When I load them into the database they go in fine but no dbxref_ids are mapped to the bioentry_id in the bioentry_dbxref table. Therefore, nothing appears in the dbxref table either! The system works fine for UniProt protein entries into the database. I am currently installing BioPerl v 1.5 to see if this resolves the problem. An example: NM_214434 from Genbank which has the dbxrefs: Pubmed 1503277 Taxon 9823 GeneID 404088 Quoting Hilmar Lapp : > Are you referring to references and their PMID? These you would find in > the Reference table, which has a foreign key to dbxref, which would > only store the PUBMED or MEDLINE ID (not both at this time). Can you > given an example accession that's giving you grief? > > -hilmar > > On Aug 8, 2005, at 1:17 AM, SG Edwards wrote: > > > Hi folks, > > > > I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db > > 1.2) set up > > containing protein and gene data. However, when I load gene sequence > > records > > (EMBL or Genbank) using: > > > > perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk -dbuser > > s0460205 > > -dbpass password -format embl /home/s0460205/file_name.txt > > > > from bioperl-db it does not pick up any dbxrefs i.e. there is no > > dbxref_id for > > MEDLINE etc. > > > > Has anyone else come across this rpoblem and is ther a fix? > > > > Cheers, > > > > Stephen > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > From jason.stajich at duke.edu Tue Aug 9 13:09:31 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Aug 9 12:59:20 2005 Subject: [Bioperl-l] Bio::DB::Taxonomy::entrez updated Message-ID: <01B3F788-B350-4087-9B71-3E62BE16911F@duke.edu> I've updated Bio::DB::Taxonomy::entrez to now fully parse out the XML from the Efetch Eutils CGI script. Can now return a fully populated Bio::Taxonomy::Node object, most importantly with a parent_id field filled in. This allows the web-only implementation to work just as the flatfile implementation does and you can walk up the taxonomy hierarchy. There is currently no way to walk down the hierarchy unless one can construct an Entrez query to get all the nodes which have a particular parent. If someone knows how to do this, please let me know. I added a few fields to Bio::Taxonomy::Node to capture genetic_code, pub_date, update_date, create_date, mitochondrial_genetic_code from the database entry. At this point I think we can think about retiring Bio::Species and replace it with Bio::Taxonomy::Node. I would probably just make Bio::Species delegate Bio::Taxonomy::Node or maybe someone can think of something more clever. There will be a bit of fiddling under the hood to make this really work, but I think it can be done for the 1.6 release and still be transparent to the user (i.e. API is completely retained for Bio::Seq->species, Bio::Species, etc however new functionality is now also available). Here is how you can use the DB interface: use Bio::DB::Taxonomy; my $db = new Bio::DB::Taxonomy(-source => 'entrez'); my $taxonid = $db->get_taxonid('Homo sapiens'); my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); print $node->binomial, "\n"; I added a script in scripts/taxa/query_entrez_taxa.PLS which demonstrates how to use it as well. Where I find this modules useful is parsing a Search Result report and classifying hits by taxonomy. Given a gi numbers in the search result (BLAST, FASTA, SSEARCH hits), getting the taxaid for the GI is just one step away now. I added a capability to the API in Bio::DB::Taxonomy::entrez for retrieving taxonomy info based on a GI number. You can pass in the - gi => $ginumber option to the get_Taxonomy_Node. Demonstration of use here: my $gi = 71836523; my $node = $db->get_Taxonomy_Node(-gi => $gi, -db => 'protein'); print $node->binomial, "\n"; my ($species,$genus,$family) = $node->classification; print "family is $family\n"; # Can also go up 4 levels my $p = $node; for ( 1..4 ) { $p = $db->get_Taxonomy_Node(-taxonid => $p->parent_id); } print $p->rank, " ", ($p->classification)[0], "\n"; # could then classify a set of BLAST hits based on their GI numbers # into taxonomic categories. I have tried to put these examples in the SYNOPSIS, t/Taxonomy.t and the script in scripts/taxa/query_entrez_taxa.PLS. If there are mistakes or typos, or something is unclear, please let us know and it can updated. I hope a section describing how to use these in SearchIO context (parsing reports) can be added when I have time. Best, -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gnf.org Tue Aug 9 12:40:12 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Aug 9 13:27:10 2005 Subject: [Bioperl-l] Not picking up Dbxrefs EMBL records In-Reply-To: <1123604467.42f8d7f396648@sms.ed.ac.uk> References: <1123489071.42f7152f3690e@sms.ed.ac.uk> <23040211b0fc735fcf7c97fc97770473@gnf.org> <1123604467.42f8d7f396648@sms.ed.ac.uk> Message-ID: <4cde9b9219492587cdb09999fb7980cc@gnf.org> This is a RefSeq accession. In GenBank format the db_xrefs you see are notes for features in the feature table, not top-level db_xrefs (i.e., for the entry itself), although semantically of course that's what they are. Bioperl (i.e., the Bioperl SeqIO parser for genbank format) doesn't interpret that however, and leaves them where they are, namely as annotation for the features. The single exception to that is that the parser actually does look for the taxon ID in the feature table and sets the $seq->species->ncbi_taxon_id property accordingly. GenBank format doesn't have top-level db_xrefs at all. You will need EMBL format for that. As I said before, the PUBMED line is not a db_xref for the entry either but the db_xref for the reference entry, so you will need to retrieve the references ($seq->annotation->get_Annotations('reference')) and use its $ref->pubmed or $ref->medline properties. BTW this will still hold true if you first load the sequences into bioperl-db and then retrieve them; there isn't really any magic being applied that would transform db_xrefs into a common unified picture. I use a SequenceProcessor (see Bio::Seq::BaseSeqProcessor and the --pipeline option to load_seqdatabase.pl) to promote db_xref tags found in the feature table of genbank records to Bio::Annotation::DBLink annotation on the sequence object. Very easy to implement and you are in total control of the annotation structure. -hilmar On Aug 9, 2005, at 9:21 AM, SG Edwards wrote: > Hi, > > My installation does not pick up ANY dbxrefs for gene records e.g. > Pubmed, > MEDLINE(either EMBL or Genbank formats). When I load them into the > database > they go in fine but no dbxref_ids are mapped to the bioentry_id in the > bioentry_dbxref table. Therefore, nothing appears in the dbxref table > either! > > The system works fine for UniProt protein entries into the database. I > am > currently installing BioPerl v 1.5 to see if this resolves the problem. > > An example: NM_214434 from Genbank which has the dbxrefs: > > Pubmed 1503277 > Taxon 9823 > GeneID 404088 > > Quoting Hilmar Lapp : > >> Are you referring to references and their PMID? These you would find >> in >> the Reference table, which has a foreign key to dbxref, which would >> only store the PUBMED or MEDLINE ID (not both at this time). Can you >> given an example accession that's giving you grief? >> >> -hilmar >> >> On Aug 8, 2005, at 1:17 AM, SG Edwards wrote: >> >>> Hi folks, >>> >>> I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db >>> 1.2) set up >>> containing protein and gene data. However, when I load gene sequence >>> records >>> (EMBL or Genbank) using: >>> >>> perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk >>> -dbuser >>> s0460205 >>> -dbpass password -format embl /home/s0460205/file_name.txt >>> >>> from bioperl-db it does not pick up any dbxrefs i.e. there is no >>> dbxref_id for >>> MEDLINE etc. >>> >>> Has anyone else come across this rpoblem and is ther a fix? >>> >>> Cheers, >>> >>> Stephen >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From markus.riester at student.uni-tuebingen.de Tue Aug 9 18:40:11 2005 From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de) Date: Tue Aug 9 14:26:55 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-files In-Reply-To: References: , , <4C10E798-EC55-4242-B573-7352B1F4FB55@pcbi.upenn.edu> Message-ID: update: the tests are written. looks good. agrep finds matches at the end of the longest arabidopsis cdna sequence (16kb). (but the tests showed some serious bugs in version 0.03, the one in the first attachment. they are all fixed in this attachment) markus markus.riester@student.uni-tuebingen.de schrieb: > with a cheap trick, yes, split the fasta files in two files. ids in one file, > sequences -one per line- in the second. > > this should be ok for cdna/protein fastafiles (but I am currently writing > tests-maybe some serious problems with the chars per line limitations show > up-but I did look good in some first tests.) > > we don't use agrep anymore, because vmatch is really, really good. only with > many mismatches and short query sequences, agrep seems to be a bit faster. > > markus > > "Aaron J. Mackey" schrieb: > > > Out of curiosity, are your patterns allowed to cross newlines > > embedded in the FASTA file? This is the typical problem with using > > grep/agrep directly with sequence files ... > > > > -Aaron > > > > On Aug 8, 2005, at 1:12 PM, > > wrote: > > > > > > > > Hi, > > > > > > I've made some modules for searching for patterns in fasta files with > > > different (really fast) backends like agrep and vmatch. I don't > > > think you > > > want to include this in standard bioperl. But we think it is useful > > > code and > > > we'd like to share it on cpan. The main reason for this email is a > > > discussion > > > about the right namespace for this module. What do you think? > > > > > > Markus > > > > > > (hope the attachment reaches the mailinglist, if not, please send > > > me a mail if > > > you are interested in this code) > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Aaron J. Mackey, Ph.D. > > Project Manager, ApiDB Bioinformatics Resource Center > > Penn Genomics Institute, University of Pennsylvania > > email: amackey@pcbi.upenn.edu > > office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) > > fax: 215-746-6697 > > postal: Penn Genomics Institute > > Goddard Labs 212 > > 415 S. University Avenue > > Philadelphia, PA 19104-6017 > > > > > > > > -- > > > -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/x-gzip Size: 36342 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/1a48a1f2/attachment-0001.bin From akarger at CGR.Harvard.edu Tue Aug 9 15:20:34 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue Aug 9 15:08:09 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-fi les Message-ID: <339D68B133EAD311971E009027DC47970321B47E@montecarlo.cgr.harvard.edu> > From: markus.riester@student.uni-tuebingen.de > "Aaron J. Mackey" schrieb: > > > Out of curiosity, are your patterns allowed to cross newlines > > embedded in the FASTA file? This is the typical problem > > with using > > grep/agrep directly with sequence files ...> > > with a cheap trick, yes, split the fasta files in two files. > ids in one file, > sequences -one per line- in the second. I wrote a simple one-liner to convert fasta to three, tab-separated columns: ID (without '>') desc, and concatenated sequence. That way you don't have to worry about keeping the two files tied together, but agrep should still find things only in the concatenated sequence. (Unless somebody mean put a sequence into the description column.) As an added bonus, it means you can throw a FASTA into Excel for sorting, filtering, etc. Or merge with a gene list pretty easily. It's at http://cgr.harvard.edu/cbg/scriptome/Tools/Change.html#new__change_a_fasta_f ile_into_tabular_format__change_fasta_to_tab_ along with the tab-to-FASTA converter, along with a couple sentences describing potential gotchas (e.g., any tabs in the desc get lost) > > this should be ok for cdna/protein fastafiles (but I am > currently writing > tests-maybe some serious problems with the chars per line > limitations show > up-but I did look good in some first tests.) Can you tell me what you mean by this? -Amir Karger From ro_phls2 at dh.gov.hk Tue Aug 9 20:34:24 2005 From: ro_phls2 at dh.gov.hk (Andrew Leung) Date: Tue Aug 9 20:26:11 2005 Subject: [Bioperl-l] Extract Mutation Automatically Message-ID: <20050810003350.DPM10378.pimx07@Leungkcro> Hi all, Is there any module available that can allow me to extract mutation(s) automatically? The idea is that if I submit two sequences for alignment, the script can automatically list out all the differences between the two sequences. I wish to know the difference at two levels, i.e. the nucleotide and amino acid level. Any ideas? Andrew From jason.stajich at duke.edu Tue Aug 9 22:35:59 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Aug 9 22:26:00 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <20050810003350.DPM10378.pimx07@Leungkcro> References: <20050810003350.DPM10378.pimx07@Leungkcro> Message-ID: <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu> I guess it comes down to what you want to do with the mutations once you've found them. The seq_inds method in Bio::Search::HSP::HSPI which is something you can call on hsp objects you've gotten out of pairwise alignment searches. seq_inds will give you the location of the identical, conserved, mismatched columns from a pairwise alignment. I would suggest using FASTA or SSEARCH and If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' Here is how I would get the pairwise SW alignment and get the mutations out. If you wanted a global alignment you can use the EMBOSS tool 'needle' and generate an MSF alignment which can be parsed with Bio::AlignIO. some simple code to print out the bases which have mismatches use Bio::SearchIO; use strict; my $fh; #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; #my $parser = Bio::SearchIO->new(-format => 'fasta', # -fh => $fh); my $parser = Bio::SearchIO->new(-format => 'blast', - fh => $fh); if( my $result = $parser->next_result ) { # single result so use if instead of while if( my $hit = $result->next_hit ) { # ditto, want single result... if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would need to consider more if using BLAST my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); # if this is protein and you want to treat the conservative matches as mismatches # you'll need to run the same method but asking for 'conserved' and then combing the two lists for my $base ( @qmismatches ) { print "base $base of the hit sequence is a mismatch \n", } } } } The Bio::PopGen::Utilities module can also take an alignment and extract the positions with variation for use in polymorphism analyses. -jason On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > Hi all, > Is there any module available that can allow me to extract mutation(s) > automatically? The idea is that if I submit two sequences for > alignment, the > script can automatically list out all the differences between the two > sequences. I wish to know the difference at two levels, i.e. the > nucleotide > and amino acid level. Any ideas? > Andrew > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From darren.obbard at ed.ac.uk Wed Aug 10 03:54:51 2005 From: darren.obbard at ed.ac.uk (Darren Obbard) Date: Wed Aug 10 03:45:08 2005 Subject: [Bioperl-l] calculating the Ka/Ks ratio References: <20050810003350.DPM10378.pimx07@Leungkcro> <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu> Message-ID: <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard> Hi all, Is there a module that will take a pair of aligned (coding) sequences, and report the Ka/Ks ratio? (non-synonymous mutations per non-synonymous site / synonymous mutations per synonymous site). I appreciate that PAML will give me an ML estimate of Ka/Ks, but I'm aiming to do a sliding-window analysis and don't wish to send each window to PAML individually, - I wondered whether there may be a quicker alternative. Thanks, Darren -- Darren Obbard Institute of Evolutionary Biology University of Edinburgh, UK darren.obbard@ed.ac.uk From hrh at sanger.ac.uk Wed Aug 10 04:18:14 2005 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Wed Aug 10 04:13:45 2005 Subject: [Bioperl-l] [Bioperl -l] Problem reading EMBL format file In-Reply-To: <8cff3eb805080904155c0682b9@mail.gmail.com> References: <8cff3eb805080904155c0682b9@mail.gmail.com> Message-ID: Iain This is one of the features of SRS. If you search EMBL with a ProteinID, you don't search EMBL but you search EMBL_features. Hence, the output is only one feature. And depending on your SRS installation this might look more or less like an EMBL entry, but is not an EMBL entry. In order to get an EMBL entry (with all the features, of course) you can do: getz "[EMBL-ProteinID:AAA46567] > embl" -e or getz "[EMBL-ProteinID:AAA46567] > parent" -e Then you get the proper embl entry (M23021) which you can feed into SeqIO Hope this helps, Hans On Tue, 9 Aug 2005, Iain Wallace wrote: > Hi all, > > Hope fully somebody will be able to help me, I am having some difficulty > reading a file that looks to me very much like EMBL format. > > I am trying to read some sequence files using SeqIO. Both files are obtained > using the getz program with the following commands > getz "[EMBL-ProteinID:AAA46567]" -e >COAT_SBMV.AAA46567.cds > getz "[EMBL-Acc:M23021]" -e > COAT_SBMV.M23021.embl > > The embl file is read fine, and I am able to extract the features I want. I > am having problems with the CDS file; it doesn't appear to be read properly. > I guess the CDS file isn't a proper EMBL format. Does anyone know what > format it is or how I could convert it to a proper EMBL format or > alternatively how to make getz return the file in the proper format. The two > files look very similar to me > > I tried the following little conversion program which worked fine on the > EMBL file, but failed on the cds file with the error: No whitespace allowed > in EMBL display id [unknown id] > > use Bio::SeqIO; > > $filename = $ARGV[0]; > $in = Bio::SeqIO->new(-file => $filename , > -format => 'EMBL'); > $out = Bio::SeqIO->new(-file => ">outputfilename" , > -format => 'EMBL'); > > while ( my $seq = $in->next_seq() ) { > $out->write_seq($seq); > } > > > Thanks for all your help > > Iain > From avilella at gmail.com Wed Aug 10 04:26:53 2005 From: avilella at gmail.com (Albert Vilella) Date: Wed Aug 10 04:17:45 2005 Subject: [Bioperl-l] calculating the Ka/Ks ratio In-Reply-To: <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard> References: <20050810003350.DPM10378.pimx07@Leungkcro> <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu> <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard> Message-ID: <1123662413.8228.3.camel@localhost.localdomain> El dc 10 de 08 del 2005 a les 08:54 +0100, en/na Darren Obbard va escriure: > Hi all, > > Is there a module that will take a pair of aligned (coding) sequences, and > report the Ka/Ks ratio? (non-synonymous mutations per non-synonymous site / > synonymous mutations per synonymous site). > > I appreciate that PAML will give me an ML estimate of Ka/Ks, but I'm aiming > to do a sliding-window analysis and don't wish to send each window to PAML > individually, - I wondered whether there may be a quicker alternative. There is a calc_KaKs_Pair method in Bio::Align::DNAStatistics (Nei-Gojobori method) >From the synopsis: my $in = new Bio::AlignIO(-format => 'fasta', -file => 't/data/nei_gojobori_test.aln'); my $alnobj = $in->next_aln; my ($seq1id,$seq2id) = map { $_->display_id } $alnobj->each_seq; my $results = $stats->calc_KaKs_pair($alnobj, $seq1id, $seq2id); print "comparing ".$results->[0]{'Seq1'}." and ".$results->[0]{'Seq2'}."\n"; for (sort keys %{$results->[0]} ){ next if /Seq/; printf("%-9s %.4f \n",$_ , $results->[0]{$_}); } my $results2 = $stats->calc_all_KaKs_pairs($alnobj); for my $an (@$results2){ print "comparing ". $an->{'Seq1'}." and ". $an->{'Seq2'}. " \n"; for (sort keys %$an ){ next if /Seq/; printf("%-9s %.4f \n",$_ , $an->{$_}); } print "\n\n"; } my $result3 = $stats->calc_average_KaKs($alnobj, 1000); for (sort keys %$result3 ){ next if /Seq/; printf("%-9s %.4f \n",$_ , $result3->{$_}); } Hope it helps, Albert. -- Albert J. Vilella avilella_at_ub_edu -------------------------------------------- Departament de Genetica Universitat de Barcelona Diagonal 645 08028, Barcelona Tel: +34 934035306 Fax: +34 934034420 -------------------------------------------- avilella_at_ebi_ac_uk EMBL Outstation, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambs. CB10 1SD, United Kingdom -------------------------------------------------- From csaba.ortutay at uta.fi Wed Aug 10 04:37:50 2005 From: csaba.ortutay at uta.fi (Csaba Ortutay) Date: Wed Aug 10 04:36:33 2005 Subject: [Bioperl-l] calculating the Ka/Ks ratio In-Reply-To: <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard> References: <20050810003350.DPM10378.pimx07@Leungkcro> <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu> <003501c59d80$ca0e80d0$9ebfd781@DarrenObbard> Message-ID: <200508101137.50388.csaba.ortutay@uta.fi> > Is there a module that will take a pair of aligned (coding) sequences, and > report the Ka/Ks ratio? (non-synonymous mutations per non-synonymous site / > synonymous mutations per synonymous site). See the Bio::Align::DNAStatistics module. That's working nicely. Csaba -- Csaba Ortutay PhD Institute of Medical Technology University of Tampere e-mail: csaba.ortutay@uta.fi From birney at ebi.ac.uk Wed Aug 10 05:00:29 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Wed Aug 10 04:59:53 2005 Subject: [Bioperl-l] Bio::SeqFeature::OntologyTypedI Proposal Message-ID: <42F9C22D.4030603@ebi.ac.uk> Hi guys... In my spare time (read... train time) I'm back on a little bit of bioperl. I hope to in the future set up an Ensembl->Bioperl Bridge (ie, seeing Ensembl objects as fully compliant bioperl objects) but before I did that I wanted to do my bit for 1.6 So... following on from Chris' proposal of sorting out SeqFeature typing, here is my proposal: Bio::SeqFeature::OntologyTypedI - extends Bio::SeqFeatureI and has a method $sf->ontology_term() which returns a Bio::Ontology::TermI compliant object. ie, the synopsis would look like: =head1 NAME Bio::SeqFeature::OntologyTypedI - a strongly typed SeqFeature =head1 SYNOPSIS # get Sequence Features in some manner, eg # from a Sequence object foreach $sf ( $seq->get_SeqFeatures() ) { # all sequence features must have primary_tag() return a string $type_as_string = $sf->primary_tag(); # ontologytyped seqfeatures have an ontology term if( $sf->isa("Bio::SeqFeature::OntologyTypedI") ) { $ot = $sf->ontology_term(); print "Ontology identifier:",$ot->identifier()," name:",$ot->name()," Description:",$ot->description(),"\n"; } else { print "Sequence Feature does not have an ontology type - tag is $type_as_string\n"; } } I would then implement this in Bio::SeqFeature::OntologyCompliant which would inheriet its implementation from Bio::SeqFeature::Generic, but chain primary_tag to $sf->ontology_term()->name(); Having done this I don't know how much "magic" I should put into SeqIO to automatically promote things into Ontology compliant terms, or perhaps we should have a converter - which one can register with a SeqIO EMBL or GenBank stream being something like $new_sf = $converter->convert($old_sf,$seq); This might conflict with an unflattener or something. What do people think about this proposal? What else do I need to do to tidy this up? From amackey at pcbi.upenn.edu Wed Aug 10 08:24:48 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Aug 10 08:16:31 2005 Subject: [Bioperl-l] Bio::SeqFeature::OntologyTypedI Proposal In-Reply-To: <42F9C22D.4030603@ebi.ac.uk> References: <42F9C22D.4030603@ebi.ac.uk> Message-ID: <6D5DA1C7-11E2-4A0C-9EF6-8A4B6ED4D388@pcbi.upenn.edu> Isn't this akin to Bio::Factory::SequenceProcessorI functionality? otherwise it all sounds good to me. -Aaron On Aug 10, 2005, at 5:00 AM, Ewan Birney wrote: > perhaps we should have a converter -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From cjm at fruitfly.org Wed Aug 10 17:03:29 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Wed Aug 10 16:53:37 2005 Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal In-Reply-To: <42F9C22D.4030603@ebi.ac.uk> References: <42F9C22D.4030603@ebi.ac.uk> Message-ID: Sounds like the beginnings of a plan! Perhaps we can come up with a shorter/catchier name but I'm not that bothered. The plan below will naturally extend to tag_values as well, with OntologyCompliant delegating the existing methods. We should also figure out how this ties in with Bio::SeqFeature::{Gene,Transcript,Exon} etc - if at all. In many ways, they are different ways of achieving the same thing, namely stronger typing of features. One scenario is that the class-types piggyback off of the ontology-typed classes. The other is that they are completely independent. Regarding 'magic' in SeqIO - not sure this is required. You can already plug in your own factories here, we just need to extend this with feature factories. The default method will continue to produce relatively light SF::Generics? On Wed, 10 Aug 2005, Ewan Birney wrote: > > Hi guys... > > > In my spare time (read... train time) I'm back on a little > bit of bioperl. I hope to in the future set up an > Ensembl->Bioperl Bridge (ie, seeing Ensembl objects as > fully compliant bioperl objects) but before I did that > I wanted to do my bit for 1.6 > > > So... following on from Chris' proposal of sorting > out SeqFeature typing, here is my proposal: > > > Bio::SeqFeature::OntologyTypedI - extends Bio::SeqFeatureI > and has a method $sf->ontology_term() which returns a > Bio::Ontology::TermI compliant object. > > ie, the synopsis would look like: > > > =head1 NAME > > Bio::SeqFeature::OntologyTypedI - a strongly typed SeqFeature > > =head1 SYNOPSIS > > > # get Sequence Features in some manner, eg > # from a Sequence object > > foreach $sf ( $seq->get_SeqFeatures() ) { > # all sequence features must have primary_tag() return a string > $type_as_string = $sf->primary_tag(); > > # ontologytyped seqfeatures have an ontology term > if( $sf->isa("Bio::SeqFeature::OntologyTypedI") ) { > $ot = $sf->ontology_term(); > print "Ontology identifier:",$ot->identifier()," name:",$ot->name()," Description:",$ot->description(),"\n"; > } else { > print "Sequence Feature does not have an ontology type - tag is $type_as_string\n"; > } > > } > > > I would then implement this in > > Bio::SeqFeature::OntologyCompliant > > which would inheriet its implementation from Bio::SeqFeature::Generic, but > chain primary_tag to > > $sf->ontology_term()->name(); > > > Having done this I don't know how much "magic" I should put into > SeqIO to automatically promote things into Ontology compliant terms, > or perhaps we should have a converter - which one can register > with a SeqIO EMBL or GenBank stream being something like > > $new_sf = $converter->convert($old_sf,$seq); > > > > This might conflict with an unflattener or something. > > > > What do people think about this proposal? What else do I need > to do to tidy this up? > > > > > > > > > > > From ro_phls2 at dh.gov.hk Wed Aug 10 20:42:11 2005 From: ro_phls2 at dh.gov.hk (Andrew Leung) Date: Wed Aug 10 20:30:50 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <10BF3FBD-CF50-4A12-9C3A-C1289D13C85E@duke.edu> Message-ID: <20050811004135.HPW10378.pimx07@Leungkcro> Hi Jason, Thank you for advice. I will try the various approaches suggested. My ultimate goal is to extract something like this: A267G, Z786-, L898Y etc. for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature for annotating mutations is a standardized one. But, it appears that there no such a ready to use module from Bioperl. Andrew -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Wednesday, August 10, 2005 10:36 AM To: andrew_leung@dh.gov.hk Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Extract Mutation Automatically I guess it comes down to what you want to do with the mutations once you've found them. The seq_inds method in Bio::Search::HSP::HSPI which is something you can call on hsp objects you've gotten out of pairwise alignment searches. seq_inds will give you the location of the identical, conserved, mismatched columns from a pairwise alignment. I would suggest using FASTA or SSEARCH and If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' Here is how I would get the pairwise SW alignment and get the mutations out. If you wanted a global alignment you can use the EMBOSS tool 'needle' and generate an MSF alignment which can be parsed with Bio::AlignIO. some simple code to print out the bases which have mismatches use Bio::SearchIO; use strict; my $fh; #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; #my $parser = Bio::SearchIO->new(-format => 'fasta', # -fh => $fh); my $parser = Bio::SearchIO->new(-format => 'blast', - fh => $fh); if( my $result = $parser->next_result ) { # single result so use if instead of while if( my $hit = $result->next_hit ) { # ditto, want single result... if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would need to consider more if using BLAST my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); # if this is protein and you want to treat the conservative matches as mismatches # you'll need to run the same method but asking for 'conserved' and then combing the two lists for my $base ( @qmismatches ) { print "base $base of the hit sequence is a mismatch \n", } } } } The Bio::PopGen::Utilities module can also take an alignment and extract the positions with variation for use in polymorphism analyses. -jason On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > Hi all, > Is there any module available that can allow me to extract mutation(s) > automatically? The idea is that if I submit two sequences for > alignment, the > script can automatically list out all the differences between the two > sequences. I wish to know the difference at two levels, i.e. the > nucleotide > and amino acid level. Any ideas? > Andrew > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Wed Aug 10 21:24:02 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 10 21:16:28 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <20050811004135.HPW10378.pimx07@Leungkcro> References: <20050811004135.HPW10378.pimx07@Leungkcro> Message-ID: <3FD6FD70-7FF7-480B-8E9F-07F2D9C3D207@duke.edu> On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote: > Hi Jason, > Thank you for advice. I will try the various approaches suggested. My > ultimate goal is to extract something like this: A267G, Z786-, > L898Y etc. > for aa and A162T, G339A, A388N, etc. for nt. Preferably, the > nomenclature > for annotating mutations is a standardized one. But, it appears > that there > no such a ready to use module from Bioperl. Don't despair, you could be the one to do it! This would probably just a be a subroutine and not necessarily a whole module. That nomenclature assumes a reference sequence and just getting the bases you are interested in. A few substr or subseq calls and you would be right there. -jason > Andrew > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, August 10, 2005 10:36 AM > To: andrew_leung@dh.gov.hk > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Extract Mutation Automatically > > I guess it comes down to what you want to do with the mutations once > you've found them. > > The seq_inds method in Bio::Search::HSP::HSPI which is something you > can call on hsp objects you've gotten out of pairwise alignment > searches. seq_inds will give you the location of the identical, > conserved, mismatched columns from a pairwise alignment. I would > suggest using FASTA or SSEARCH and > > If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' > > Here is how I would get the pairwise SW alignment and get the > mutations out. > > If you wanted a global alignment you can use the EMBOSS tool 'needle' > and generate an MSF alignment which can be parsed with Bio::AlignIO. > > some simple code to print out the bases which have mismatches > use Bio::SearchIO; > use strict; > my $fh; > #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; > open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; > #my $parser = Bio::SearchIO->new(-format => 'fasta', > # -fh => $fh); > my $parser = Bio::SearchIO->new(-format => 'blast', > - > fh => $fh); > > if( my $result = $parser->next_result ) { # single result so use if > instead of while > if( my $hit = $result->next_hit ) { # ditto, want single > result... > if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would > need to consider more if using BLAST > > my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); > # if this is protein and you want to treat the conservative > matches as mismatches > # you'll need to run the same method but asking for > 'conserved' and then combing the two lists > > for my $base ( @qmismatches ) { > print "base $base of the hit sequence is a mismatch \n", > } > } > } > } > > > The Bio::PopGen::Utilities module can also take an alignment and > extract the positions with variation for use in polymorphism analyses. > > -jason > > On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > > >> Hi all, >> Is there any module available that can allow me to extract mutation >> (s) >> automatically? The idea is that if I submit two sequences for >> alignment, the >> script can automatically list out all the differences between the two >> sequences. I wish to know the difference at two levels, i.e. the >> nucleotide >> and amino acid level. Any ideas? >> Andrew >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From brian_osborne at cognia.com Thu Aug 11 10:57:51 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 11 10:55:28 2005 Subject: [Bioperl-l] Re: Feature-Annotation HOWTO In-Reply-To: Message-ID: Pedro, First, make sure to write to bioperl-l with questions, there are certainly people there who know as much, or more, about Features and Annotations as me. With regard to references, I believe this bug has been fixed in the latest Bioperl, bioperl-live. With regard to the ids ("db_xref"), you'll have to show us what the source file is and what @ids looks like, I'm afraid I didn't exactly understand the problem. With regard to SeqIO, your code looks fine but you've only shown part of it so I can't be sure. Here's another rendition: >perl -e 'use Bio::DB::GenBank; $db = new Bio::DB::GenBank; $seq = $db->get_Seq_by_id(2); use Bio::SeqIO; $out = Bio::SeqIO->new(-fh => \*STDERR, -format => "fasta"); $out->write_seq($seq);' >A00002 B.taurus DNA sequence 1 from patent application EP0238993. AATTCATGCGTCCGGACTTCTGCCTCGAGCCGCCGTACACTGGGCCCTGCAAAGCTCGTA TCATCCGTTACTTCTACAATGCAAAGGCAGGCCTGTGTCAGACCTTCGTATACGGCGGTT GCCGTGCTAAGCGTAACAACTTCAAATCCGCGGAAGACTGCGAACGTACTTGCGGTGGTC CTTAGTAAAGCTTG Generally speaking, show the entire script as well as any related files so nothing is left to the imagination. Brian O. On 8/11/05 9:42 AM, "Pedro Antonio Reche" wrote: > Dear Brian, > I have tried your code from the HOWTO > > > my @annotations = $anno_collection->get_Annotations('reference'); > if ($value->tagname eq "reference") { > my $hash_ref = $value->hash_tree; > for my $key (keys %{$hash_ref}) { > print $key,": ",$hash_ref->{$key},"\n"; > } > > on the gb record attached in this e-mail and I unfortunatelly I am > unable to get the medline record. I have also tried > > my @annotations = $anno_collection->get_Annotations('reference'); > > print "author: ",$value->authors(), "\n"; > print "Title: ",$value->title(), "\n"; > print "Medline: ",$value->medline(), "\n"; > print "PubMed: ",$value->pubmed(), "\n"; > print "Database: ",$value->database(), "\n"; > > with the same result. i can not print the medline record. I have also > find that the code: > for my $feat_object ($seq_object->get_SeqFeatures) { > push @ids,$feat_object->get_tag_values("db_xref") > if ($feat_object->has_tag("db_xref")); > } > > does not populate @ids properly with the unique values under > "db_xreff" but with repeated concatenated values. > Finally, given that > > $seq_object = $feat_object->entire_seq; > > returns a Bio::PrimarySeq I tried to define > my $out = new Bio::SeqIO(-fh => \*STDERR, -format => 'fasta'); > > to print the sequences as > > $out->write_seq($seq_object ) > > but it did not work. > > > Any help to solve these problem will be apprecitated. I am using > bioperl 1.4 > Regards, > pedro From MEC at Stowers-Institute.org Thu Aug 11 12:38:06 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Thu Aug 11 12:32:29 2005 Subject: [Bioperl-l] Extract Mutation Automatically Message-ID: <200508111628.j7BGS4Tv022396@portal.open-bio.org> re: standardized nomenclature for mutations, see Recommendations for a nomenclature system for human gene mutations a copy of which can be found http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A//mecp2.chw.edu.au /mecp2/info/mutation_nomenclature_1.pdf&ei=6H37QsvRGo34igGj4vBS&sig2=eYV DZb467rYOBf-0sCtctQ -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Andrew Leung Sent: Wednesday, August 10, 2005 7:42 PM To: 'Jason Stajich' Cc: bioperl-l@bioperl.org Subject: RE: [Bioperl-l] Extract Mutation Automatically Hi Jason, Thank you for advice. I will try the various approaches suggested. My ultimate goal is to extract something like this: A267G, Z786-, L898Y etc. for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature for annotating mutations is a standardized one. But, it appears that there no such a ready to use module from Bioperl. Andrew -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Wednesday, August 10, 2005 10:36 AM To: andrew_leung@dh.gov.hk Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Extract Mutation Automatically I guess it comes down to what you want to do with the mutations once you've found them. The seq_inds method in Bio::Search::HSP::HSPI which is something you can call on hsp objects you've gotten out of pairwise alignment searches. seq_inds will give you the location of the identical, conserved, mismatched columns from a pairwise alignment. I would suggest using FASTA or SSEARCH and If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' Here is how I would get the pairwise SW alignment and get the mutations out. If you wanted a global alignment you can use the EMBOSS tool 'needle' and generate an MSF alignment which can be parsed with Bio::AlignIO. some simple code to print out the bases which have mismatches use Bio::SearchIO; use strict; my $fh; #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; #my $parser = Bio::SearchIO->new(-format => 'fasta', # -fh => $fh); my $parser = Bio::SearchIO->new(-format => 'blast', - fh => $fh); if( my $result = $parser->next_result ) { # single result so use if instead of while if( my $hit = $result->next_hit ) { # ditto, want single result... if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would need to consider more if using BLAST my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); # if this is protein and you want to treat the conservative matches as mismatches # you'll need to run the same method but asking for 'conserved' and then combing the two lists for my $base ( @qmismatches ) { print "base $base of the hit sequence is a mismatch \n", } } } } The Bio::PopGen::Utilities module can also take an alignment and extract the positions with variation for use in polymorphism analyses. -jason On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > Hi all, > Is there any module available that can allow me to extract mutation(s) > automatically? The idea is that if I submit two sequences for > alignment, the > script can automatically list out all the differences between the two > sequences. I wish to know the difference at two levels, i.e. the > nucleotide > and amino acid level. Any ideas? > Andrew > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Thu Aug 11 15:52:32 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Aug 11 15:42:37 2005 Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal In-Reply-To: References: <42F9C22D.4030603@ebi.ac.uk> Message-ID: <644796f4db2eff94029490616b548f48@gnf.org> On Aug 10, 2005, at 2:03 PM, Chris Mungall wrote: > Regarding 'magic' in SeqIO - not sure this is required. You can already > plug in your own factories here, we just need to extend this with > feature > factories. The default method will continue to produce relatively light > SF::Generics? Right, this is exactly what I was thinking. A feature factory that creates ontology-compliant features will also probably need to have something like an OntologyTermResolver, in order to check a given feature type (primary_tag) against an ontology that sits somewhere (local file, local database, remote database, or even remote file). -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Thu Aug 11 17:09:07 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Aug 11 16:59:26 2005 Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal In-Reply-To: <644796f4db2eff94029490616b548f48@gnf.org> References: <42F9C22D.4030603@ebi.ac.uk> <644796f4db2eff94029490616b548f48@gnf.org> Message-ID: <42FBBE73.5080906@ebi.ac.uk> Hilmar Lapp wrote: > > On Aug 10, 2005, at 2:03 PM, Chris Mungall wrote: > >> Regarding 'magic' in SeqIO - not sure this is required. You can already >> plug in your own factories here, we just need to extend this with feature >> factories. The default method will continue to produce relatively light >> SF::Generics? > > > Right, this is exactly what I was thinking. A feature factory that > creates ontology-compliant features will also probably need to have > something like an OntologyTermResolver, in order to check a given > feature type (primary_tag) against an ontology that sits somewhere > (local file, local database, remote database, or even remote file). > Ok - I'll hold off the magic for the moment, but I think it would be nice to have just-enough of SO in-built into Bioperl so one could do something like: $seqio = Bio::SeqIO->new( -file => "-", -format => 'EMBL', -feature_converter => 'SO'); and the "right thing" happens. Does anyone want to propose an alt name to Bio::SeqFeature::OntologyTypedI? But for that is it ok for me to implement and commit? > -hilmar From cjm at fruitfly.org Thu Aug 11 17:29:50 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Thu Aug 11 17:23:35 2005 Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal In-Reply-To: <42FBBE73.5080906@ebi.ac.uk> References: <42F9C22D.4030603@ebi.ac.uk> <644796f4db2eff94029490616b548f48@gnf.org> <42FBBE73.5080906@ebi.ac.uk> Message-ID: On Thu, 11 Aug 2005, Ewan Birney wrote: > > > Hilmar Lapp wrote: > > > > On Aug 10, 2005, at 2:03 PM, Chris Mungall wrote: > > > >> Regarding 'magic' in SeqIO - not sure this is required. You can already > >> plug in your own factories here, we just need to extend this with feature > >> factories. The default method will continue to produce relatively light > >> SF::Generics? > > > > > > Right, this is exactly what I was thinking. A feature factory that > > creates ontology-compliant features will also probably need to have > > something like an OntologyTermResolver, in order to check a given > > feature type (primary_tag) against an ontology that sits somewhere > > (local file, local database, remote database, or even remote file). > > > > Ok - I'll hold off the magic for the moment, but I think it would > be nice to have just-enough of SO in-built into Bioperl so one > could do something like: > > $seqio = Bio::SeqIO->new( -file => "-", -format => 'EMBL', -feature_converter => 'SO'); > > and the "right thing" happens. actually, Bio::SeqFeature::Tools::TypeMapper already does this. Well, you still have to wrap it to have the above work, but the mapping is there. Of course, you can always provide your own mapping as a hash (which could come from an ontology, a database, whatever). But like you say the gb->SO type mapping is so common that it's good to have a default hardcoding here. > Does anyone want to propose an alt name to > > Bio::SeqFeature::OntologyTypedI? > > But for that is it ok for me to implement and commit? > > > > -hilmar > From hlapp at gnf.org Thu Aug 11 17:43:07 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Aug 11 17:35:28 2005 Subject: [Bioperl-l] Re: Bio::SeqFeature::OntologyTypedI Proposal In-Reply-To: <42FBBE73.5080906@ebi.ac.uk> References: <42F9C22D.4030603@ebi.ac.uk> <644796f4db2eff94029490616b548f48@gnf.org> <42FBBE73.5080906@ebi.ac.uk> Message-ID: On Aug 11, 2005, at 2:09 PM, Ewan Birney wrote: > Does anyone want to propose an alt name to > > Bio::SeqFeature::OntologyTypedI? Frankly I'd just call it Bio::SeqFeature::TypedI, or in the unabbreviated style (which I'd personally much prefer) Bio::SeqFeature::TypedSeqFeatureI. (We also have Bio::Seq::RichSeqI, not Bio::Seq::RichI.) I.e., the important cue that the name should give is (should be) that this is a strongly typed feature. Like Chris mentioned earlier there are different ways to achieve typing, but I don't think we will eventually want those different ways to be distinct from each other in Bioperl - the choice between untyped scruffy and typed tidy should suffice. > > But for that is it ok for me to implement and commit? I don't know why I would want to stop you :-) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From ro_phls2 at dh.gov.hk Fri Aug 12 03:53:32 2005 From: ro_phls2 at dh.gov.hk (Andrew Leung) Date: Fri Aug 12 03:42:30 2005 Subject: [Bioperl-l] Return $hit->name by Score Bit value when parsing blast result Message-ID: <20050812075259.CMX1864.pimx07@Leungkcro> Hi, I did a StandAloneBlast and this resulted in a blast result object. When I use obj->next_result and obj->next_hit methods to list the hit name (hit->name), I found that they are not returned in an order that is similar to a standard blast result. In a standard blast report, we are familiar with the fact that hits are ordered by score bit values. With bioperl, how can I list the hits by score bits? Shall I manually extract all the hits' score bit and then do a hash sorting? Or, they are a better way to achieve it. Andrew From ro_phls2 at dh.gov.hk Fri Aug 12 06:50:04 2005 From: ro_phls2 at dh.gov.hk (Andrew Leung) Date: Fri Aug 12 06:39:05 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <3FD6FD70-7FF7-480B-8E9F-07F2D9C3D207@duke.edu> Message-ID: <20050812104931.DJB1864.pimx07@Leungkcro> Hi Jason, I have tired the the seq_inds method in Bio::Search::HSP::HSPI. But, other than identical and conserved, there is no "mismatched" option. http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/HSPI.html#POD15 I am still thinking of how to get the mismatch details. Working from identical/conserved seq_inds values seems to be very complicated. Andrew -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Thursday, August 11, 2005 9:24 AM To: andrew_leung@dh.gov.hk Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Extract Mutation Automatically On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote: > Hi Jason, > Thank you for advice. I will try the various approaches suggested. My > ultimate goal is to extract something like this: A267G, Z786-, > L898Y etc. > for aa and A162T, G339A, A388N, etc. for nt. Preferably, the > nomenclature > for annotating mutations is a standardized one. But, it appears > that there > no such a ready to use module from Bioperl. Don't despair, you could be the one to do it! This would probably just a be a subroutine and not necessarily a whole module. That nomenclature assumes a reference sequence and just getting the bases you are interested in. A few substr or subseq calls and you would be right there. -jason > Andrew > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, August 10, 2005 10:36 AM > To: andrew_leung@dh.gov.hk > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Extract Mutation Automatically > > I guess it comes down to what you want to do with the mutations once > you've found them. > > The seq_inds method in Bio::Search::HSP::HSPI which is something you > can call on hsp objects you've gotten out of pairwise alignment > searches. seq_inds will give you the location of the identical, > conserved, mismatched columns from a pairwise alignment. I would > suggest using FASTA or SSEARCH and > > If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' > > Here is how I would get the pairwise SW alignment and get the > mutations out. > > If you wanted a global alignment you can use the EMBOSS tool 'needle' > and generate an MSF alignment which can be parsed with Bio::AlignIO. > > some simple code to print out the bases which have mismatches > use Bio::SearchIO; > use strict; > my $fh; > #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; > open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; > #my $parser = Bio::SearchIO->new(-format => 'fasta', > # -fh => $fh); > my $parser = Bio::SearchIO->new(-format => 'blast', > - > fh => $fh); > > if( my $result = $parser->next_result ) { # single result so use if > instead of while > if( my $hit = $result->next_hit ) { # ditto, want single > result... > if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would > need to consider more if using BLAST > > my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); > # if this is protein and you want to treat the conservative > matches as mismatches > # you'll need to run the same method but asking for > 'conserved' and then combing the two lists > > for my $base ( @qmismatches ) { > print "base $base of the hit sequence is a mismatch \n", > } > } > } > } > > > The Bio::PopGen::Utilities module can also take an alignment and > extract the positions with variation for use in polymorphism analyses. > > -jason > > On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > > >> Hi all, >> Is there any module available that can allow me to extract mutation >> (s) >> automatically? The idea is that if I submit two sequences for >> alignment, the >> script can automatically list out all the differences between the two >> sequences. I wish to know the difference at two levels, i.e. the >> nucleotide >> and amino acid level. Any ideas? >> Andrew >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Fri Aug 12 07:58:31 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Aug 12 07:50:51 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <20050812104931.DJB1864.pimx07@Leungkcro> References: <20050812104931.DJB1864.pimx07@Leungkcro> Message-ID: <6758370F-0C05-4AE5-A4DC-30F9013C10AB@duke.edu> 'nomatch' On Aug 12, 2005, at 6:50 AM, Andrew Leung wrote: > Hi Jason, > I have tired the the seq_inds method in Bio::Search::HSP::HSPI. > But, other > than identical and conserved, there is no "mismatched" option. > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/ > HSPI.html#POD15 > > I am still thinking of how to get the mismatch details. Working from > identical/conserved seq_inds values seems to be very complicated. > Andrew > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Thursday, August 11, 2005 9:24 AM > To: andrew_leung@dh.gov.hk > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Extract Mutation Automatically > > > On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote: > > >> Hi Jason, >> Thank you for advice. I will try the various approaches suggested. My >> ultimate goal is to extract something like this: A267G, Z786-, >> L898Y etc. >> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the >> nomenclature >> for annotating mutations is a standardized one. But, it appears >> that there >> no such a ready to use module from Bioperl. >> > > Don't despair, you could be the one to do it! This would probably > just a be a subroutine and not necessarily a whole module. > > That nomenclature assumes a reference sequence and just getting the > bases you are interested in. A few substr or subseq calls and you > would be right there. > > -jason > > > >> Andrew >> >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich@duke.edu] >> Sent: Wednesday, August 10, 2005 10:36 AM >> To: andrew_leung@dh.gov.hk >> Cc: bioperl-l@bioperl.org >> Subject: Re: [Bioperl-l] Extract Mutation Automatically >> >> I guess it comes down to what you want to do with the mutations once >> you've found them. >> >> The seq_inds method in Bio::Search::HSP::HSPI which is something you >> can call on hsp objects you've gotten out of pairwise alignment >> searches. seq_inds will give you the location of the identical, >> conserved, mismatched columns from a pairwise alignment. I would >> suggest using FASTA or SSEARCH and >> >> If you had two files with seqs to align called 'seq1.fa' and >> 'seq2.fa' >> >> Here is how I would get the pairwise SW alignment and get the >> mutations out. >> >> If you wanted a global alignment you can use the EMBOSS tool 'needle' >> and generate an MSF alignment which can be parsed with Bio::AlignIO. >> >> some simple code to print out the bases which have mismatches >> use Bio::SearchIO; >> use strict; >> my $fh; >> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; >> open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; >> #my $parser = Bio::SearchIO->new(-format => 'fasta', >> # -fh => $fh); >> my $parser = Bio::SearchIO->new(-format => 'blast', >> - >> fh => $fh); >> >> if( my $result = $parser->next_result ) { # single result so use if >> instead of while >> if( my $hit = $result->next_hit ) { # ditto, want single >> result... >> if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would >> need to consider more if using BLAST >> >> my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); >> # if this is protein and you want to treat the conservative >> matches as mismatches >> # you'll need to run the same method but asking for >> 'conserved' and then combing the two lists >> >> for my $base ( @qmismatches ) { >> print "base $base of the hit sequence is a mismatch \n", >> } >> } >> } >> } >> >> >> The Bio::PopGen::Utilities module can also take an alignment and >> extract the positions with variation for use in polymorphism >> analyses. >> >> -jason >> >> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: >> >> >> >>> Hi all, >>> Is there any module available that can allow me to extract mutation >>> (s) >>> automatically? The idea is that if I submit two sequences for >>> alignment, the >>> script can automatically list out all the differences between the >>> two >>> sequences. I wish to know the difference at two levels, i.e. the >>> nucleotide >>> and amino acid level. Any ideas? >>> Andrew >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri Aug 12 08:06:07 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Aug 12 07:56:21 2005 Subject: [Bioperl-l] Return $hit->name by Score Bit value when parsing blast result In-Reply-To: <20050812075259.CMX1864.pimx07@Leungkcro> References: <20050812075259.CMX1864.pimx07@Leungkcro> Message-ID: <13D5095C-7906-416A-B733-B0494B94AF6D@duke.edu> They are supposed to be returned in the order they are found in the report -- although I remember there may be something inconsistent with the code added to handle PSIBlast parsing too. I've not yet investigated this so so I don't know whether or not it is a bug. At any rate, you can always collect all the Hits into an array an sort them: my @hits = $result->hits; for my $hit ( sort { $a->bits <=> $b->bits } @hits ) { } If you read the documentation for Bio::Search::Result::ResultI you'll see a 'sort_hits' function which should also allow you to provide a sorting function to control the order of the hits. -jason On Aug 12, 2005, at 3:53 AM, Andrew Leung wrote: > Hi, > > I did a StandAloneBlast and this resulted in a blast result object. > When I > use obj->next_result and obj->next_hit methods to list the hit name > (hit->name), I found that they are not returned in an order that is > similar > to a standard blast result. In a standard blast report, we are > familiar with > the fact that hits are ordered by score bit values. With bioperl, > how can I > list the hits by score bits? Shall I manually extract all the hits' > score > bit and then do a hash sorting? Or, they are a better way to > achieve it. > > Andrew > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From markus.riester at student.uni-tuebingen.de Fri Aug 12 14:21:55 2005 From: markus.riester at student.uni-tuebingen.de (markus.riester@student.uni-tuebingen.de) Date: Fri Aug 12 08:23:27 2005 Subject: [Bioperl-l] where to discuss namespaces for modules? Message-ID: hi, sorry for writting again. Is this the right place to discuss namespaces? http://www.weigelworld.org/resources/software/perl_modules/ Weigel::Search was only a temporary namespace and I think it is not very good when we upload this to cpan with this namespace. Maybe Bio::Search? Or Bio::Pat(tern)Search? Would be very nice to hear some feedback from you! Best regards, Markus From ram at i122server.vu-wien.ac.at Fri Aug 12 04:51:38 2005 From: ram at i122server.vu-wien.ac.at (Rambabu Gudavalli) Date: Fri Aug 12 09:14:41 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 28, Issue 6 In-Reply-To: <200508112105.j7BL19Tx026719@portal.open-bio.org> References: <200508112105.j7BL19Tx026719@portal.open-bio.org> Message-ID: <22b738fa01916301dcc6c48b289277e1@i122server.vu-wien.ac.at> Dear all, i have question that, how can i download the popset file by using the bioperl. i know the id [gi:22724863] i can do it manually, but need more files, so i wanna do it by using bioperl. here is the URL for one file that i need to download. http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi? db=popset&view=ps&val=22724863 thank you, Ram -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 436 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050812/c0f06520/attachment.bin From ram at i122server.vu-wien.ac.at Fri Aug 12 05:09:59 2005 From: ram at i122server.vu-wien.ac.at (Rambabu Gudavalli) Date: Fri Aug 12 09:14:48 2005 Subject: [Bioperl-l] download popset file using bioperl Message-ID: <515c08d6d1253e609971d2a84a47ec25@i122server.vu-wien.ac.at> Dear all, i have question that, how can i download the popset file by using the bioperl. i know the id [gi:22724863] i can do it manually, but need more files, so i wanna do it by using bioperl. here is the URL for one file that i need to download. http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi? db=popset&view=ps&val=22724863 thank you, Ram -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 436 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050812/4b44a249/attachment.bin From nel at birc.dk Sun Aug 14 23:37:55 2005 From: nel at birc.dk (Niels Larsen) Date: Sun Aug 14 23:28:03 2005 Subject: [Bioperl-l] get_Seq_by_id question In-Reply-To: <1123119449.11338.3.camel@bacp4> References: <1123119449.11338.3.camel@bacp4> Message-ID: <1124077075.43000e1358638@webmail.daimi.au.dk> Greetings, When I do require Bio::DB::EMBL; $embl = new Bio::DB::EMBL(); $entry = $embl->get_Seq_by_id( "AF222686" ); Then I get one entry, EMBL:AY883858. Am I doing something wrong? get_Seq_by_acc returns the same. That entry AY883858, btw, is the first in the list one gets when searching with "AF222686" at the EBI front page (http://www.ebi.ac.uk). Niels L From heikki at ebi.ac.uk Mon Aug 15 06:41:39 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Aug 15 06:36:26 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <20050811004135.HPW10378.pimx07@Leungkcro> References: <20050811004135.HPW10378.pimx07@Leungkcro> Message-ID: <200508151141.39713.heikki@ebi.ac.uk> Andrew, Once you have extracted the information, you can create Bio::Variation objects which know how to stringify the description according to human mutation nomenclature rules. In practise, you create a Bio::Variation::SeqDiff object, add to it the appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and call methods sysname() for nucleotides descriptor or trivname() for amino acid descriptor. The nomenclature used is not the most recent complex suggestion from den Dunnen et al but original (and in basic cases identical) from Antonorakis et al. -Heikki On Thursday 11 August 2005 01:42, Andrew Leung wrote: > Hi Jason, > Thank you for advice. I will try the various approaches suggested. My > ultimate goal is to extract something like this: A267G, Z786-, L898Y etc. > for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature > for annotating mutations is a standardized one. But, it appears that there > no such a ready to use module from Bioperl. > Andrew > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, August 10, 2005 10:36 AM > To: andrew_leung@dh.gov.hk > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Extract Mutation Automatically > > I guess it comes down to what you want to do with the mutations once > you've found them. > > The seq_inds method in Bio::Search::HSP::HSPI which is something you > can call on hsp objects you've gotten out of pairwise alignment > searches. seq_inds will give you the location of the identical, > conserved, mismatched columns from a pairwise alignment. I would > suggest using FASTA or SSEARCH and > > If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' > > Here is how I would get the pairwise SW alignment and get the > mutations out. > > If you wanted a global alignment you can use the EMBOSS tool 'needle' > and generate an MSF alignment which can be parsed with Bio::AlignIO. > > some simple code to print out the bases which have mismatches > use Bio::SearchIO; > use strict; > my $fh; > #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; > open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; > #my $parser = Bio::SearchIO->new(-format => 'fasta', > # -fh => $fh); > my $parser = Bio::SearchIO->new(-format => 'blast', > - > fh => $fh); > > if( my $result = $parser->next_result ) { # single result so use if > instead of while > if( my $hit = $result->next_hit ) { # ditto, want single > result... > if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would > need to consider more if using BLAST > > my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); > # if this is protein and you want to treat the conservative > matches as mismatches > # you'll need to run the same method but asking for > 'conserved' and then combing the two lists > > for my $base ( @qmismatches ) { > print "base $base of the hit sequence is a mismatch \n", > } > } > } > } > > > The Bio::PopGen::Utilities module can also take an alignment and > extract the positions with variation for use in polymorphism analyses. > > -jason > > On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > > Hi all, > > Is there any module available that can allow me to extract mutation(s) > > automatically? The idea is that if I submit two sequences for > > alignment, the > > script can automatically list out all the differences between the two > > sequences. I wish to know the difference at two levels, i.e. the > > nucleotide > > and amino acid level. Any ideas? > > Andrew > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Mon Aug 15 07:39:19 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Aug 15 07:32:08 2005 Subject: [Bioperl-l] get_Seq_by_id question In-Reply-To: <1124077075.43000e1358638@webmail.daimi.au.dk> References: <1123119449.11338.3.camel@bacp4> <1124077075.43000e1358638@webmail.daimi.au.dk> Message-ID: <200508151239.19410.heikki@ebi.ac.uk> Niels, There is something funny going on with the underlying SRS engine. I'll get to the bottom it and report back. -Heikki On Monday 15 August 2005 04:37, Niels Larsen wrote: > Greetings, > > When I do > > require Bio::DB::EMBL; > > $embl = new Bio::DB::EMBL(); > $entry = $embl->get_Seq_by_id( "AF222686" ); > > Then I get one entry, EMBL:AY883858. Am I doing something wrong? > get_Seq_by_acc returns the same. That entry AY883858, btw, is the > first in the list one gets when searching with "AF222686" at the EBI > front page (http://www.ebi.ac.uk). > > Niels L > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Mon Aug 15 09:22:49 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Aug 15 09:18:01 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-fi les In-Reply-To: <339D68B133EAD311971E009027DC47970321B47E@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970321B47E@montecarlo.cgr.harvard.edu> Message-ID: <200508151422.49280.heikki@ebi.ac.uk> On Tuesday 09 August 2005 20:20, Amir Karger wrote: > > I wrote a simple one-liner to convert fasta to three, tab-separated > columns: ID (without '>') desc, and concatenated sequence. That way you > don't have to worry about keeping the two files tied together, but agrep > should still find things only in the concatenated sequence. (Unless > somebody mean put a sequence into the description column.) As an added > bonus, it means you can throw a FASTA into Excel for sorting, filtering, > etc. Or merge with a gene list pretty easily. > It's at > http://cgr.harvard.edu/cbg/scriptome/Tools/Change.html#new__change_a_fasta_ >f ile_into_tabular_format__change_fasta_to_tab_ > along with the tab-to-FASTA converter, along with a couple sentences > describing potential gotchas (e.g., any tabs in the desc get lost) > Amir, FYI, this is already implemented as 'tab' format in Bio::SeqIO. -Heikki From cain at cshl.edu Mon Aug 15 10:35:10 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 15 10:25:17 2005 Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta? Message-ID: <1124116511.2891.9.camel@localhost.localdomain> Hello all, I am investigating a bug in GBrowse that seems to only surface when people are using the memory (ie, file) adaptor on Windows systems. Here's the bug report: https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&group_id=27707 I've tracked the problem down to Bio::DB::Fasta when the file is dos formatted (that is, it has both line feeds and carriage returns), BDF returns the wrong string when a subsequence is requested, but when the file is unix formatted (ie only CR (or is it only LF?)), it returns the right string. I wrote the very simple test script below and stepped it through the perl debugger. It looks like the bug is in the caloffset method, as it returns the same offsets regardless of the file type, which then makes the subsequent seek into the file go to the wrong coordinates of dos formatted files. Unfortunately, I don't really know what is going on caloffset, so I don't know how to fix it, but it presumably has to check the format of the file somewhere and take that into account. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From heikki at ebi.ac.uk Mon Aug 15 11:00:42 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Aug 15 10:50:14 2005 Subject: [Bioperl-l] get_Seq_by_id question In-Reply-To: <200508151239.19410.heikki@ebi.ac.uk> References: <1124077075.43000e1358638@webmail.daimi.au.dk> <200508151239.19410.heikki@ebi.ac.uk> Message-ID: <200508151600.42365.heikki@ebi.ac.uk> Ok. This should be fixed soon. A couple of releases ago EMBL introduced accession number ranges for situations where there is a long list of secondary accession numbers it (GenBank has used them a bit longer), e.g. AC AY883861; AF333345-AF333346; AH010225; The code that expanded this range was broken in the EBI SRS server. It was fixed yesterday, but with the huge size if the database it takes a while to propagate the fix into the public server. Yours, -Heikki On Monday 15 August 2005 12:39, Heikki Lehvaslaiho wrote: > Niels, > > There is something funny going on with the underlying SRS engine. I'll get > to the bottom it and report back. > > -Heikki > > On Monday 15 August 2005 04:37, Niels Larsen wrote: > > Greetings, > > > > When I do > > > > require Bio::DB::EMBL; > > > > $embl = new Bio::DB::EMBL(); > > $entry = $embl->get_Seq_by_id( "AF222686" ); > > > > Then I get one entry, EMBL:AY883858. Am I doing something wrong? > > get_Seq_by_acc returns the same. That entry AY883858, btw, is the > > first in the list one gets when searching with "AF222686" at the EBI > > front page (http://www.ebi.ac.uk). > > > > Niels L > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From golharam at umdnj.edu Mon Aug 15 11:38:46 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon Aug 15 11:29:45 2005 Subject: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle Message-ID: <011301c5a1af$6cf54810$2f01a8c0@GOLHARMOBILE1> EMBOSS needle reports percent identity and percent similarity, however Bio::Align::AlignI has no method for obtain the percent similarity. My code is essentially: my $in = new Bio::AlignIO(-format => 'emboss', -fh => new IO::String($output)); my $aln = $in->next_aln; $fepct = $aln->overall_percentage_identity; I tried the different percentage_identity methods to see if any of them work, but they don't give the similarity number. Is there a way to get the percent similarity through bioperl? Also, the description part of the document for overall_percentage_identity has a type for the Title. Ryan From jason.stajich at duke.edu Mon Aug 15 12:04:24 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 15 11:55:32 2005 Subject: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle In-Reply-To: <011301c5a1af$6cf54810$2f01a8c0@GOLHARMOBILE1> References: <011301c5a1af$6cf54810$2f01a8c0@GOLHARMOBILE1> Message-ID: <5B6F1C45-4B48-40AC-BC0A-BF50173DD40E@duke.edu> You could sort of figure it out by processing the match_line output and counting the number of ':", '.', and '*' items and dividing by the aln length. If we did parse it - there isn't really anywhere to put that sort of field right now in SimpleAlign. I generally just have simple perl parser I run on needle/water output to get the percent similar/identical stats and if I need the alignment then parse it again with AlignIO; Something like: my %stats; while(<$io>) { if(/^\#\s+(Identity|Similarity|Gaps):\s+(\d+)\/(\d+)\s+\(\s*(\d+\.\d +)\s*%\s*\)/ ) { $stats{$1} = [$2,$3,$4]; } } $io->seek(0); # process with AlignIO.... -jason On Aug 15, 2005, at 11:38 AM, Ryan Golhar wrote: > EMBOSS needle reports percent identity and percent similarity, however > Bio::Align::AlignI has no method for obtain the percent similarity. > > My code is essentially: > > my $in = new Bio::AlignIO(-format => 'emboss', -fh => new > IO::String($output)); > my $aln = $in->next_aln; > $fepct = $aln->overall_percentage_identity; > > I tried the different percentage_identity methods to see if any of > them > work, but they don't give the similarity number. Is there a way to > get > the percent similarity through bioperl? > > Also, the description part of the document for > overall_percentage_identity has a type for the Title. > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cain at cshl.edu Mon Aug 15 13:22:29 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 15 13:12:52 2005 Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta? In-Reply-To: <1124116511.2891.9.camel@localhost.localdomain> References: <1124116511.2891.9.camel@localhost.localdomain> Message-ID: <1124126549.2868.2.camel@localhost.localdomain> Just to follow up on my own email with a little more information: in Fasta.pm, line 697: $termination_length ||= /\r\n$/ ? 2 : 1; # account for crlf-terminated Windows files The pattern match is failing on DOS formatted files; I don't know why. Does anyone else? On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote: > Hello all, > > I am investigating a bug in GBrowse that seems to only surface when > people are using the memory (ie, file) adaptor on Windows systems. > Here's the bug report: > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&group_id=27707 > > I've tracked the problem down to Bio::DB::Fasta when the file is dos > formatted (that is, it has both line feeds and carriage returns), BDF > returns the wrong string when a subsequence is requested, but when the > file is unix formatted (ie only CR (or is it only LF?)), it returns the > right string. I wrote the very simple test script below and stepped it > through the perl debugger. It looks like the bug is in the caloffset > method, as it returns the same offsets regardless of the file type, > which then makes the subsequent seek into the file go to the wrong > coordinates of dos formatted files. > > Unfortunately, I don't really know what is going on caloffset, so I > don't know how to fix it, but it presumably has to check the format of > the file somewhere and take that into account. > > Thanks, > Scott > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From golharam at umdnj.edu Mon Aug 15 13:59:22 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon Aug 15 13:49:03 2005 Subject: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle In-Reply-To: <5B6F1C45-4B48-40AC-BC0A-BF50173DD40E@duke.edu> Message-ID: <013301c5a1c3$10c6b600$2f01a8c0@GOLHARMOBILE1> That's exactly what I'm doing now....just regex parsing the similarity line...didn't know if it was built into bioperl and I was just missing it... -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Monday, August 15, 2005 12:04 PM To: golharam@umdnj.edu Cc: 'Bioperl List' Subject: Re: [Bioperl-l] Bio::Align::AlignI for EMBOSS:needle You could sort of figure it out by processing the match_line output and counting the number of ':", '.', and '*' items and dividing by the aln length. If we did parse it - there isn't really anywhere to put that sort of field right now in SimpleAlign. I generally just have simple perl parser I run on needle/water output to get the percent similar/identical stats and if I need the alignment then parse it again with AlignIO; Something like: my %stats; while(<$io>) { if(/^\#\s+(Identity|Similarity|Gaps):\s+(\d+)\/(\d+)\s+\(\s*(\d+\.\d +)\s*%\s*\)/ ) { $stats{$1} = [$2,$3,$4]; } } $io->seek(0); # process with AlignIO.... -jason On Aug 15, 2005, at 11:38 AM, Ryan Golhar wrote: > EMBOSS needle reports percent identity and percent similarity, however > Bio::Align::AlignI has no method for obtain the percent similarity. > > My code is essentially: > > my $in = new Bio::AlignIO(-format => 'emboss', -fh => new > IO::String($output)); my $aln = $in->next_aln; > $fepct = $aln->overall_percentage_identity; > > I tried the different percentage_identity methods to see if any of > them > work, but they don't give the similarity number. Is there a way to > get > the percent similarity through bioperl? > > Also, the description part of the document for > overall_percentage_identity has a type for the Title. > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Mon Aug 15 18:11:45 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 15 18:02:54 2005 Subject: [Bioperl-l] GuessSeqFormat problems Message-ID: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> Albert - I think the new guessing changes for phylip are causing havoc. Lots of tests are failing t/GuessSeqFeature.t. Can you take a look? I was looking over this module - it seems like we probably want to run the tests in a particular order as some matches are ambiguous and we probably need to have preferred order. At least we'll know when something fails, what the order. Another thing is it uses open directly instead of allowing Root::IO to open a filehandle. If went to using Root::IO, it would allow peeking at not only a file but a filehandle/stream and then use _pushback after we have peeked over the first few lines, guess the format, then pass it along to the SeqIO/AlignIO handle appropriately. Anyways, just thoughts... -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From ro_phls2 at dh.gov.hk Mon Aug 15 20:49:52 2005 From: ro_phls2 at dh.gov.hk (Andrew Leung) Date: Mon Aug 15 20:38:47 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <200508151141.39713.heikki@ebi.ac.uk> Message-ID: <20050816004925.KUF1864.pimx07@Leungkcro> Hi Heikki, Thank you for your note. I now have two strands of sequences obtained from a hsp and an array of mutation position information resulted from seq_inds() with 'mismatch' option. Do you mean that I can put these data to Bio::Variation and generate a mutation list as desired? I am quite new to Bioperl. Can you explain in greater details? I've read the documentation for Bio::Variation, but it appears to me that its methods are mainly for "set", but not for "reading" mutation. Andrew = = = = = = = = = = Andrew, Once you have extracted the information, you can create Bio::Variation objects which know how to stringify the description according to human mutation nomenclature rules. In practise, you create a Bio::Variation::SeqDiff object, add to it the appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and call methods sysname() for nucleotides descriptor or trivname() for amino acid descriptor. The nomenclature used is not the most recent complex suggestion from den Dunnen et al but original (and in basic cases identical) from Antonorakis et al. -Heikki On Thursday 11 August 2005 01:42, Andrew Leung wrote: > Hi Jason, > Thank you for advice. I will try the various approaches suggested. My > ultimate goal is to extract something like this: A267G, Z786-, L898Y etc. > for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature > for annotating mutations is a standardized one. But, it appears that there > no such a ready to use module from Bioperl. > Andrew > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, August 10, 2005 10:36 AM > To: andrew_leung@dh.gov.hk > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Extract Mutation Automatically > > I guess it comes down to what you want to do with the mutations once > you've found them. > > The seq_inds method in Bio::Search::HSP::HSPI which is something you > can call on hsp objects you've gotten out of pairwise alignment > searches. seq_inds will give you the location of the identical, > conserved, mismatched columns from a pairwise alignment. I would > suggest using FASTA or SSEARCH and > > If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' > > Here is how I would get the pairwise SW alignment and get the > mutations out. > > If you wanted a global alignment you can use the EMBOSS tool 'needle' > and generate an MSF alignment which can be parsed with Bio::AlignIO. > > some simple code to print out the bases which have mismatches > use Bio::SearchIO; > use strict; > my $fh; > #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; > open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; > #my $parser = Bio::SearchIO->new(-format => 'fasta', > # -fh => $fh); > my $parser = Bio::SearchIO->new(-format => 'blast', > - > fh => $fh); > > if( my $result = $parser->next_result ) { # single result so use if > instead of while > if( my $hit = $result->next_hit ) { # ditto, want single > result... > if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would > need to consider more if using BLAST > > my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); > # if this is protein and you want to treat the conservative > matches as mismatches > # you'll need to run the same method but asking for > 'conserved' and then combing the two lists > > for my $base ( @qmismatches ) { > print "base $base of the hit sequence is a mismatch \n", > } > } > } > } > > > The Bio::PopGen::Utilities module can also take an alignment and > extract the positions with variation for use in polymorphism analyses. > > -jason > > On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > > Hi all, > > Is there any module available that can allow me to extract mutation(s) > > automatically? The idea is that if I submit two sequences for > > alignment, the > > script can automatically list out all the differences between the two > > sequences. I wish to know the difference at two levels, i.e. the > > nucleotide > > and amino acid level. Any ideas? > > Andrew > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From avilella at gmail.com Tue Aug 16 04:50:03 2005 From: avilella at gmail.com (Albert Vilella) Date: Tue Aug 16 04:40:28 2005 Subject: [Bioperl-l] Re: GuessSeqFormat problems In-Reply-To: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> Message-ID: <1124182203.8208.14.camel@localhost.localdomain> El dl 15 de 08 del 2005 a les 18:11 -0400, en/na Jason Stajich va escriure: > > > Albert - > > > I think the new guessing changes for phylip are causing havoc. Lots > of tests are failing t/GuessSeqFeature.t. Can you take a look? Uops, sorry about that. I was trying to make the match for phylip more generic in $lineno=2. In my case it was returning an unexistent Bio::AlignIO::pir. I have fixed it and now passes all the tests. > > > I was looking over this module - it seems like we probably want to run > the tests in a particular order as some matches are ambiguous and we > probably need to have preferred order. At least we'll know when > something fails, what the order. As I understand from the DESCRIPTION, the more lines one checks, the better is determined, isn't it? Maybe it would help to add more line checks in some of the formats, that are loosely constricted in their first lines. > Another thing is it uses open directly instead of allowing Root::IO to > open a filehandle. If went to using Root::IO, it would allow peeking > at not only a file but a filehandle/stream and then use _pushback > after we have peeked over the first few lines, guess the format, then > pass it along to the SeqIO/AlignIO handle appropriately. > > > Anyways, just thoughts... > > > -jason > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > > From heikki at ebi.ac.uk Tue Aug 16 05:06:07 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Aug 16 04:56:31 2005 Subject: [Bioperl-l] Extract Mutation Automatically In-Reply-To: <20050816004925.KUF1864.pimx07@Leungkcro> References: <20050816004925.KUF1864.pimx07@Leungkcro> Message-ID: <200508161006.08243.heikki@ebi.ac.uk> Andrew, You are right, Bio::Variation objects only store and format the findings. This same question popped up a couple of months ago. See: http://portal.open-bio.org/pipermail/bioperl-l/2005-June/019242.html I wonder if Julio got round to writing the code? -Heikki On Tuesday 16 August 2005 01:49, Andrew Leung wrote: > Hi Heikki, > Thank you for your note. > I now have two strands of sequences obtained from a hsp and an array of > mutation position information resulted from seq_inds() with 'mismatch' > option. Do you mean that I can put these data to Bio::Variation and > generate a mutation list as desired? I am quite new to Bioperl. Can you > explain in greater details? I've read the documentation for Bio::Variation, > but it appears to me that its methods are mainly for "set", but not for > "reading" mutation. > Andrew > > > > = = = = = = = = = = > Andrew, > > Once you have extracted the information, you can create Bio::Variation > objects > which know how to stringify the description according to human mutation > nomenclature rules. > > In practise, you create a Bio::Variation::SeqDiff object, add to it the > appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and > call > methods sysname() for nucleotides descriptor or trivname() for amino acid > descriptor. > > The nomenclature used is not the most recent complex suggestion from den > Dunnen et al but original (and in basic cases identical) from Antonorakis > et > > al. > > -Heikki > > On Thursday 11 August 2005 01:42, Andrew Leung wrote: > > Hi Jason, > > Thank you for advice. I will try the various approaches suggested. My > > ultimate goal is to extract something like this: A267G, Z786-, L898Y etc. > > for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature > > for annotating mutations is a standardized one. But, it appears that > > there no such a ready to use module from Bioperl. > > Andrew > > > > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich@duke.edu] > > Sent: Wednesday, August 10, 2005 10:36 AM > > To: andrew_leung@dh.gov.hk > > Cc: bioperl-l@bioperl.org > > Subject: Re: [Bioperl-l] Extract Mutation Automatically > > > > I guess it comes down to what you want to do with the mutations once > > you've found them. > > > > The seq_inds method in Bio::Search::HSP::HSPI which is something you > > can call on hsp objects you've gotten out of pairwise alignment > > searches. seq_inds will give you the location of the identical, > > conserved, mismatched columns from a pairwise alignment. I would > > suggest using FASTA or SSEARCH and > > > > If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa' > > > > Here is how I would get the pairwise SW alignment and get the > > mutations out. > > > > If you wanted a global alignment you can use the EMBOSS tool 'needle' > > and generate an MSF alignment which can be parsed with Bio::AlignIO. > > > > some simple code to print out the bases which have mismatches > > use Bio::SearchIO; > > use strict; > > my $fh; > > #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!; > > open($fh, "fasta34 seq1.fa seq2.fa |") || die $!; > > #my $parser = Bio::SearchIO->new(-format => 'fasta', > > # -fh => $fh); > > my $parser = Bio::SearchIO->new(-format => 'blast', > > - > > fh => $fh); > > > > if( my $result = $parser->next_result ) { # single result so use if > > instead of while > > if( my $hit = $result->next_hit ) { # ditto, want single > > result... > > if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would > > need to consider more if using BLAST > > > > my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch'); > > # if this is protein and you want to treat the conservative > > matches as mismatches > > # you'll need to run the same method but asking for > > 'conserved' and then combing the two lists > > > > for my $base ( @qmismatches ) { > > print "base $base of the hit sequence is a mismatch \n", > > } > > } > > } > > } > > > > > > The Bio::PopGen::Utilities module can also take an alignment and > > extract the positions with variation for use in polymorphism analyses. > > > > -jason > > > > On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote: > > > Hi all, > > > Is there any module available that can allow me to extract mutation(s) > > > automatically? The idea is that if I submit two sequences for > > > alignment, the > > > script can automatically list out all the differences between the two > > > sequences. I wish to know the difference at two levels, i.e. the > > > nucleotide > > > and amino acid level. Any ideas? > > > Andrew > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > Duke University > > http://www.duke.edu/~jes12 > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From andreas.kahari at ebi.ac.uk Tue Aug 16 05:42:20 2005 From: andreas.kahari at ebi.ac.uk (Andreas Kahari) Date: Tue Aug 16 05:33:22 2005 Subject: [Bioperl-l] Re: GuessSeqFormat problems In-Reply-To: <1124182203.8208.14.camel@localhost.localdomain> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <1124182203.8208.14.camel@localhost.localdomain> Message-ID: <20050816094220.GB17612@ebi.ac.uk> On Tue, Aug 16, 2005 at 10:50:03AM +0200, Albert Vilella wrote: > El dl 15 de 08 del 2005 a les 18:11 -0400, en/na Jason Stajich va > escriure: > > > > Albert - > > > > I think the new guessing changes for phylip are causing havoc. Lots > > of tests are failing t/GuessSeqFeature.t. Can you take a look? > > Uops, sorry about that. > > I was trying to make the match for phylip more generic in $lineno=2. In > my case it was returning an unexistent Bio::AlignIO::pir. > > I have fixed it and now passes all the tests. > > > I was looking over this module - it seems like we probably want to run > > the tests in a particular order as some matches are ambiguous and we > > probably need to have preferred order. At least we'll know when > > something fails, what the order. > > As I understand from the DESCRIPTION, the more lines one checks, the > better is determined, isn't it? > > Maybe it would help to add more line checks in some of the formats, that > are loosely constricted in their first lines. Yes, in some cases. For a format to "win", its test needs to be the "last one standing" after all the others have failed. This naturally means that adding more formats will make the guessing more uncertain, and the test rules need to be more and more specific for them to be really useful. On the other hand, adding rules (or-parts to the if-statement) might make the test push other tests out of the competition even though they might be more deterministic of the actual format. I was playing around with some "scoring" of the formats, so that one could write a format test that would be allowed to sometimes fail in one rule without disqualify that format as a possible candidate. This was too elaborate at the time and I settled for a simple pass/fail system. (Disclaimer :-) My aim in writing the module was to have a *guessing* facility, not a routine that *determines* the format of the input data. I hope that this has been made clear. > > Another thing is it uses open directly instead of allowing Root::IO to > > open a filehandle. If went to using Root::IO, it would allow peeking > > at not only a file but a filehandle/stream and then use _pushback > > after we have peeked over the first few lines, guess the format, then > > pass it along to the SeqIO/AlignIO handle appropriately. This is a good suggestion. I will not have time to do this now though, so if no-one else wants to supply this patch I'll look at it at a later stage. Regards, Andreas -- Andreas K?h?ri EMBL-EBI/ensembl ---{ www.embl.org }---{ www.ebi.ac.uk }---{ www.ensembl.org }--- From andreas.kahari at ebi.ac.uk Tue Aug 16 05:56:25 2005 From: andreas.kahari at ebi.ac.uk (Andreas Kahari) Date: Tue Aug 16 05:46:46 2005 Subject: [Bioperl-l] Re: GuessSeqFormat problems In-Reply-To: <20050816094220.GB17612@ebi.ac.uk> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <1124182203.8208.14.camel@localhost.localdomain> <20050816094220.GB17612@ebi.ac.uk> Message-ID: <20050816095625.GC17612@ebi.ac.uk> On Tue, Aug 16, 2005 at 10:42:20AM +0100, Andreas Kahari wrote: [cut] > Yes, in some cases. For a format to "win", its test needs to > be the "last one standing" after all the others have failed. Looking at the code for the first time in some time, I realize this is not how it is actually done, but almost. If any one line (one at the time, from the start and onwards) from the input data matches only one format, then the guesser returns that format as the format of the data. Maybe it would be better if tests were ticked off the list as they failed and never re-run? Andreas -- Andreas K?h?ri EMBL-EBI/ensembl ---{ www.embl.org }---{ www.ebi.ac.uk }---{ www.ensembl.org }--- From avilella at gmail.com Tue Aug 16 06:17:14 2005 From: avilella at gmail.com (Albert Vilella) Date: Tue Aug 16 06:07:51 2005 Subject: [Bioperl-l] Re: GuessSeqFormat problems In-Reply-To: <20050816095625.GC17612@ebi.ac.uk> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <1124182203.8208.14.camel@localhost.localdomain> <20050816094220.GB17612@ebi.ac.uk> <20050816095625.GC17612@ebi.ac.uk> Message-ID: <1124187435.8208.32.camel@localhost.localdomain> El dt 16 de 08 del 2005 a les 10:56 +0100, en/na Andreas Kahari va escriure: > On Tue, Aug 16, 2005 at 10:42:20AM +0100, Andreas Kahari wrote: > [cut] > > Yes, in some cases. For a format to "win", its test needs to > > be the "last one standing" after all the others have failed. > > Looking at the code for the first time in some time, I realize > this is not how it is actually done, but almost. If any one > line (one at the time, from the start and onwards) from the > input data matches only one format, then the guesser returns > that format as the format of the data. > > Maybe it would be better if tests were ticked off the list as > they failed and never re-run? GuessSeqFormat would tick off the format if the next $lineno regex fails: - Look at line 1, tick off the formats that won't comply with the ($lineno == 1 && $line =~/regex/) - Look at line 2, further eliminate the formats that won't comply with the ($lineno == 2 && $line =~/regex/). - and so on for line 3, 4 (and presumably not much more). This should eliminate cases were a format passes the regex for line 2 although line 1 indicates it is not that format. I suppose this is what was happening in my pir/phylip case. Albert. > > > > Andreas > From akarger at CGR.Harvard.edu Tue Aug 16 08:17:19 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue Aug 16 08:03:48 2005 Subject: [Bioperl-l] new modules for sarching for patterns in fasta-fi les Message-ID: <339D68B133EAD311971E009027DC47970321B6EA@montecarlo.cgr.harvard.edu> > -----Original Message----- > From: Heikki Lehvaslaiho [mailto:heikki@ebi.ac.uk] > > On Tuesday 09 August 2005 20:20, Amir Karger wrote: > > > > I wrote a simple one-liner to convert fasta to three, tab-separated > > columns: ID (without '>') desc, and concatenated sequence. > > FYI, this is already implemented as 'tab' format in Bio::SeqIO. > > -Heikki I decided to write a separate translator for two reasons. First, I thought people might want the desc in a separate column. (SeqIO::tab just takes the entire desc line in one shot, right?) Second, I believe that some people who use the Scriptome toolbox might not have Bioperl installed, and I don't want to force them to have Bioperl just to parse some FASTAs. (OTOH, I was Lazy enough to steal Bio::SeqIO to do most format conversions.) -Amir From heikki at ebi.ac.uk Tue Aug 16 08:47:30 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Aug 16 08:48:51 2005 Subject: [Bioperl-l] Announce: Bio::Seq::Quality In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F5440@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA62F5440@ANTARESIA.be.devgen.com> Message-ID: <200508161347.30875.heikki@ebi.ac.uk> Marc, See if the new version of Bio::Seq::Quality works the way you like. On Thursday 14 July 2005 16:54, Marc Logghe wrote: > Personally I'd use that optionally by setting/resetting a padding flag > or something. I'd more be interested in having a way to validate your > Bio::Seq::Quality one way or another. In de case padding is switched > off, I'd like to know whether my sequence length is exactly the same as > my quality array. Does that make sense ? > In conclusion I'd opt for a inconsistency check and an optional padding > feature. I've finished restructuring Bio::Seq::MetaI classes so that they not any more automatically pad with empty values or truncate meta values to sequence length. This older behaviour can be activated by setting force_flush() true. These new methods have been added to Bio::Meta::MetaI: force_flush() meta_length() named_meta_length() is_flush() Since Bio::Seq::Quality has two meta sets with explicit names ('quality', 'trace'), these new methods are in place, too: quality_is_flush() quality_length() trace_is_flush() trace_length() Enjoy, -Heikki From mayagao1999 at yahoo.com Tue Aug 16 12:30:00 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Tue Aug 16 12:22:09 2005 Subject: [Bioperl-l] A question about the perl code Message-ID: <20050816163000.95901.qmail@web53501.mail.yahoo.com> Dear all, I made a group A which includes 16 combinations of any two nucleotides like: AA,AC,AG,AT, CA,CC,CG,CT, GA,GC,GG,GT, TA,TC,TG,TT If I randomly got a pair like AC, I want to exclude AC, AT, AG, AA, TC, CC, GC. In other words, I want to exclude the pairs in group A which has the same nucleotide with the pair randomly selected. Can anybody suggest me how to approach this using Perl? Thanks! Alex ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs From johan.viklund at gmail.com Tue Aug 16 13:09:07 2005 From: johan.viklund at gmail.com (Johan Viklund) Date: Tue Aug 16 12:58:50 2005 Subject: [Bioperl-l] A question about the perl code In-Reply-To: <20050816163000.95901.qmail@web53501.mail.yahoo.com> References: <20050816163000.95901.qmail@web53501.mail.yahoo.com> Message-ID: <5e924f0a0508161009786c819f@mail.gmail.com> Hi, If you have all the pairs in an array, say @nucleotide_pairs, and the pair you randomly selected in the scalar $pair this will work: @selected_pairs = grep { not /[$pair]/ } @nucleotide_pairs; For a description on what grep does look in the perlfunc perldoc page (on the web: On 8/16/05, Alex Zhang wrote: > Dear all, > > I made a group A which includes 16 combinations of any > two nucleotides like: AA,AC,AG,AT, > CA,CC,CG,CT, > GA,GC,GG,GT, > TA,TC,TG,TT > > If I randomly got a pair like AC, I want to exclude > AC, AT, AG, AA, TC, CC, GC. In other words, I want to > exclude the pairs in group A which has the same > nucleotide with the pair randomly selected. Can > anybody suggest me how to approach this using Perl? > > Thanks! > Alex > > > > ____________________________________________________ > Start your day with Yahoo! - make it your home page > http://www.yahoo.com/r/hs > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Johan Viklund E-post: ----------------- perl -we '$,=" ";$_=bless sub{shift;print split(/::/,ref)},Just::Another::Perl::Hacker;&$_' From taerwin at tpg.com.au Wed Aug 17 03:15:03 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Wed Aug 17 03:45:14 2005 Subject: [Bioperl-l] Another GuessSeqFormat question In-Reply-To: <1124187435.8208.32.camel@localhost.localdomain> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <1124182203.8208.14.camel@localhost.localdomain> <20050816094220.GB17612@ebi.ac.uk> <20050816095625.GC17612@ebi.ac.uk> <1124187435.8208.32.camel@localhost.localdomain> Message-ID: <1124262903.10144.71.camel@bacp4> Hi, Is there a way to determine which parser to use based on the guess from Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested in parsing and storing various files to a database. I was wondering if it is a good idea to make a some extra functions so that files could be parsed automatically. i.e for a fasta file my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename ); my $format = $obj->guess; my $parser = $obj->parser; #RETURNS Bio::SeqIO my $next_method = $obj->next_method; #RETURNS next_seq my $write_method = $obj->write_method; #RETURNS write_seq #PARSE FILE my $infile = new $parser(-file => $filename, -format => $format); while (my $result = $infile->$next_method) { #DO STUFF HERE #ADD $result TO DATABASE } Perhaps there is a better way to do this? Any suggestions would be great. Regards, Tim From heikki at ebi.ac.uk Wed Aug 17 05:03:02 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Aug 17 05:20:14 2005 Subject: [Bioperl-l] Another GuessSeqFormat question In-Reply-To: <1124262903.10144.71.camel@bacp4> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <1124187435.8208.32.camel@localhost.localdomain> <1124262903.10144.71.camel@bacp4> Message-ID: <200508171003.02240.heikki@ebi.ac.uk> Tim, Bio::Tools::GuessSeqFormat is not meant to be used directly. It is called automatically by the constructor (new() method) of Bio::SeqIO: my $format = $param{'-format'} || $class->_guess_format( $param{-file} || $ARGV[0] ); if( ! $format ) { if ($param{-file}) { $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}|| $ARGV[0] )->guess; } elsif ($param{-fh}) { $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}|| $ARGV[0] )->guess; } } # ... code removed return "Bio::SeqIO::$format"->new(@args); The logic from the above code is as follows: 1. _guess_format() tries to determine the format of the file based on the filename extension. 2. Only if that fails try looking into the file/stream to guess the format using the Bio::Tools::GuessSeqFormat code. 3. The returned object is not a Bio::SeqIO but a Bio::SeqIO::$format object, which has the correct next_seq() and write_seq() methods. You can therefore use ref($seqoobject) to find out what parser is being used. The standard code for doing this should contain all the automation needed: foreach my $inputfilename (@all_files) { my $in = Bio::SeqIO->new(-file => $inputfilename); while ( my $seq = $in->next_seq() ) { # do something } } Yours, -Heikki On Wednesday 17 August 2005 08:15, Tim Erwin wrote: > Hi, > > Is there a way to determine which parser to use based on the guess from > Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested > in parsing and storing various files to a database. > > I was wondering if it is a good idea to make a some extra functions so that > files could be parsed automatically. > > i.e for a fasta file > > my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename ); > my $format = $obj->guess; > my $parser = $obj->parser; #RETURNS Bio::SeqIO > my $next_method = $obj->next_method; #RETURNS next_seq > my $write_method = $obj->write_method; #RETURNS write_seq > > #PARSE FILE > my $infile = new $parser(-file => $filename, -format => $format); > while (my $result = $infile->$next_method) { > > #DO STUFF HERE > #ADD $result TO DATABASE > > } > > Perhaps there is a better way to do this? Any suggestions would be great. > > Regards, > > Tim > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason.stajich at duke.edu Wed Aug 17 12:21:03 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 17 12:13:55 2005 Subject: [Bioperl-l] thanks for the hardwork on HOWTO changeover Message-ID: <386446B4-C2E6-4413-85E2-259B090EEC92@duke.edu> I just want to publicly thank Brian Osborne for all the work to get the docbook and bioperl HOWTOs working more smoothly. Brian has spent a lot of time recently figuring out to get the XML -> HTML and XML->PDF really working correctly. The point of writing things in docbook instead of latex, POD, plain text, or HTML is docbook is (intended) to provide fairly easy transformation of the document text into a number of different formats (RTF, plain text, HTML, PDF). ( Once you get the tools working of course). The website should now have up-to-date versions of the documentation here: http://bioperl.org/HOWTOs and reflect the latest version of these documents that are in CVS. In the future the website HOWTOs will be kept up to date more closely with the versions in the CVS repository instead of the last official release. Brian has taken care of a lot of behind the scenes things in terms of project documentation and deserves a lot of credit for moving us forward in trying to make the toolkit more accessible to different levels of programmers. So I'm sending out a big thank you! Please give these HOWTOs a try, print them out, frame them on your walls, etc. If you spot inconsistencies or weaknesses please try and help out by suggesting changes or adding text. We'd of course encourage other people to help write HOWTOs about particular aspects of Bioperl or uses of Bioperl. You don't need to be an ubercoder to write one. If the XML format scares you, ask questions and have a look at the existing documents in doc/howto/xml. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From brian_osborne at cognia.com Wed Aug 17 13:00:02 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Aug 17 12:51:29 2005 Subject: [Bioperl-l] Re: thanks for the hardwork on HOWTO changeover In-Reply-To: <386446B4-C2E6-4413-85E2-259B090EEC92@duke.edu> Message-ID: Jason, My pleasure. You also wrote: >We'd of course encourage other people to help write HOWTOs about particular aspects of Bioperl or uses of Bioperl.? >You don't need to be an ubercoder to write one.? That?s right. There?s certainly more to be done on these HOWTOs. For example, you could imagine an Align HOWTO, a bioperl-db HOWTO, or a Structure HOWTO (though the code may be lagging here). I could also see HOWTOs on Ontology, Graph, Biblio, and so on. You could also imagine ones I can?t imagine, I?m sure. There are also missing sections in the Beginners, Feature-Annotation, and Graphics HOWTOs, to name a few. I?d like to make a particular appeal to those of you who would like to contribute but aren?t sure how. The great thing about writing these sorts of things is that you end up knowing quite a bit about the subject matter, so choosing a topic that interests you but that you don?t know well is a good thing. You delve into the modules, you write and test code, you think of new methods, it?s a great way to learn Bioperl . Brian O. On 8/17/05 12:21 PM, "Jason Stajich" wrote: > > I just want to?publicly?thank Brian Osborne for all the work to get the > docbook and bioperl HOWTOs working more smoothly.?? > > Brian has spent a lot of time recently figuring out to get the XML -> HTML and > XML->PDF really working correctly.? The point of writing things in docbook > instead of latex, POD, plain text, or HTML is docbook is (intended) to provide > fairly easy transformation of the document text into a number of different > formats (RTF, plain text, HTML, PDF).? (?Once you get the tools working of > course).? > > The website should now have up-to-date versions of the documentation here: > http://bioperl.org/HOWTOs and reflect the latest version of these documents > that are in CVS.?? > > In the future the website HOWTOs will be kept up to date more closely with the > versions in the CVS repository instead of the last official release. > > ? ?? ?? ?Brian has taken care of a lot of behind the scenes things in terms of > project documentation and deserves a lot of credit for moving us forward in > trying to make the toolkit more accessible to different levels of > programmers.??So I'm sending out a big thank you! > > Please give these HOWTOs a try, print them out, frame them on your walls, > etc.? If you spot inconsistencies or weaknesses please try and help out by > suggesting changes or adding text. > > We'd of course encourage other people to help write HOWTOs about particular > aspects of Bioperl or uses of Bioperl.? You don't need to be an ubercoder to > write one.? If the XML format scares you, ask questions and have a look at the > existing documents in doc/howto/xml. > > > > > -- > > > Jason Stajich > > > jason.stajich at duke.edu > > > http://www.duke.edu/~jes12/ > > > > From akarger at CGR.Harvard.edu Wed Aug 17 16:30:39 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed Aug 17 16:17:31 2005 Subject: [Bioperl-l] A question about the perl code Message-ID: <339D68B133EAD311971E009027DC47970354B9BB@montecarlo.cgr.harvard.edu> > -----Original Message----- > From: Johan Viklund [mailto:johan.viklund@gmail.com] > On 8/16/05, Alex Zhang wrote: > > Dear all, > > > > I made a group A which includes 16 combinations of any > > two nucleotides like: AA,AC,AG,AT, > > CA,CC,CG,CT, > > GA,GC,GG,GT, > > TA,TC,TG,TT > > > > If I randomly got a pair like AC, I want to exclude > > AC, AT, AG, AA, TC, CC, GC. In other words, I want to > > exclude the pairs in group A which has the same > > nucleotide with the pair randomly selected. Can> > > Hi, > > If you have all the pairs in an array, say @nucleotide_pairs, and the > pair you randomly selected in the scalar $pair this will work: > > @selected_pairs = grep { not /[$pair]/ } @nucleotide_pairs; I don't think that's true. The above exclues anything with an A or C in either position. (Btw, I used @pairs, not @nucleotide_pairs, for brevity.) >perl -le 'foreach $i (qw(A C G T)) {foreach $j (qw (A C G T)) { push @pairs, "$i$j"}} print join " ", @pairs' AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT >perl -le 'foreach $i (qw(A C G T)) {foreach $j (qw (A C G T)) { push @pairs, "$i$j"}} $pair = "AC"; @selected_pairs = grep { not /[$pair]/ } @pairs; print join " ", @selected_pairs' GG GT TG TT I believe the requirement is that it can't have an A in position 0 or a C in position 1. One way to do it (not a particularly pretty way): >perl -le 'foreach $i (qw(A C G T)) {foreach $j (qw (A C G T)) { push @pairs, "$i$j"}} ($n1, $n2) = split //, "AC"; @selected_pairs = grep { /[^$n1][^$n2]/ } @pairs; print join " ", @selected_pairs' CA CG CT GA GG GT TA TG TT The easiest way might really just be something like "grep {substr($_, 0, 1) != substr($pair, 0, 1) && substr($_, 1, 1) != substr($pair, 1, 1)} @nucleotide_pairs -Amir Karger From taerwin at tpg.com.au Wed Aug 17 19:18:33 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Wed Aug 17 19:15:18 2005 Subject: [Bioperl-l] Another GuessSeqFormat question In-Reply-To: <200508171003.02240.heikki@ebi.ac.uk> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <1124187435.8208.32.camel@localhost.localdomain> <1124262903.10144.71.camel@bacp4> <200508171003.02240.heikki@ebi.ac.uk> Message-ID: <1124320713.10144.80.camel@bacp4> Thanks, Heikki, but I am trying to parse different IO objects such as AlignIO, SeqIO and SearchIO, but what I am trying to do is guess the format of any IO object and then use the appropriate parser. i.e If I have a unknown file output.out I want to guess the format and then the appropriate IO parser to use. Is there a way to do this or should I just test all the IO parsers with an eval block. Regards, Tim On Wed, 2005-08-17 at 10:03 +0100, Heikki Lehvaslaiho wrote: > > Tim, > > Bio::Tools::GuessSeqFormat is not meant to be used directly. It is called > automatically by the constructor (new() method) of Bio::SeqIO: > > my $format = $param{'-format'} || > $class->_guess_format( $param{-file} || $ARGV[0] ); > > if( ! $format ) { > if ($param{-file}) { > $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}|| > $ARGV[0] )->guess; > } elsif ($param{-fh}) { > $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}|| > $ARGV[0] )->guess; > } > } > # ... code removed > return "Bio::SeqIO::$format"->new(@args); > > The logic from the above code is as follows: > > 1. _guess_format() tries to determine the format of the file based on the > filename extension. > > 2. Only if that fails try looking into the file/stream to guess the format > using the Bio::Tools::GuessSeqFormat code. > > 3. The returned object is not a Bio::SeqIO but a Bio::SeqIO::$format object, > which has the correct next_seq() and write_seq() methods. You can therefore > use ref($seqoobject) to find out what parser is being used. > > > > The standard code for doing this should contain all the automation needed: > > foreach my $inputfilename (@all_files) { > my $in = Bio::SeqIO->new(-file => $inputfilename); > while ( my $seq = $in->next_seq() ) { > # do something > } > } > > > Yours, > -Heikki > > > On Wednesday 17 August 2005 08:15, Tim Erwin wrote: > > Hi, > > > > Is there a way to determine which parser to use based on the guess from > > Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested > > in parsing and storing various files to a database. > > > > I was wondering if it is a good idea to make a some extra functions so that > > files could be parsed automatically. > > > > i.e for a fasta file > > > > my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename ); > > my $format = $obj->guess; > > my $parser = $obj->parser; #RETURNS Bio::SeqIO > > my $next_method = $obj->next_method; #RETURNS next_seq > > my $write_method = $obj->write_method; #RETURNS write_seq > > > > #PARSE FILE > > my $infile = new $parser(-file => $filename, -format => $format); > > while (my $result = $infile->$next_method) { > > > > #DO STUFF HERE > > #ADD $result TO DATABASE > > > > } > > > > Perhaps there is a better way to do this? Any suggestions would be great. > > > > Regards, > > > > Tim > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From b_corbomite at hotmail.com Wed Aug 17 22:58:21 2005 From: b_corbomite at hotmail.com (Bryan Yi) Date: Wed Aug 17 22:48:15 2005 Subject: [Bioperl-l] Problems using Bio::Ext::Align and Bio::SeqIO::staden::read Message-ID: I was attempting to do pairwise alignments for 2 DNA sequences so I tried to use the Align module and when I ran into problems such as not having all the .h files I was able to solve the problems by reading the mailing list archives and even got the newest version for dpAlign.pm form CVS. However, I'm now getting this problem. Had problems bootstrapping Inline module 'Bio::SeqIO::staden::read' Can't load '/usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi/auto/Bio/SeqIO/staden/read/read.so' for module Bio::SeqIO::staden::read: /usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi/auto/Bio/SeqIO/staden/read/read.so: undefined symbol: deflateInit2_ at /usr/lib/perl5/5.8.6/i586-linux-thread-multi/DynaLoader.pm line 230, line 1. at /usr/lib/perl5/site_perl/5.8.6/Inline.pm line 500 at aligntest.pl line 0 INIT failed--call queue aborted, line 1. I'm sure that everything is in their place and I even installed the Bio::SeqIO::staden::read module personally, Can anybody help me with this problem? Also, is there another way to code a script that does pairwise alignments without having to code everything from scratch? From heikki at ebi.ac.uk Thu Aug 18 05:54:37 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Aug 18 05:44:58 2005 Subject: [Bioperl-l] Another GuessSeqFormat question In-Reply-To: <1124320713.10144.80.camel@bacp4> References: <15A27FF5-1150-4810-9F67-FBC7083F8B53@duke.edu> <200508171003.02240.heikki@ebi.ac.uk> <1124320713.10144.80.camel@bacp4> Message-ID: <200508181054.38138.heikki@ebi.ac.uk> Tim, I thought there must be something in your problem I did not catch! In principle it could be done, but practise it would be really difficult, these text based formats just vary too much - the most recent GuessSeqFormat shows that well. I would suggest that you try do determine ways to separate AlignIO, SeqIO and SearchIO files from each other and then call the appropriate one. Once you got the heuristics together you might want to think of putting the logic into a module. Fasta files pose a big problem hese. There is no general way to know if a fasta file is representing an alignment or not. For your specific case, you might find a heuristics that tells them apart, e.g. ratio of gap characters to residues, but that is highly unlikely to hold on someone else's data. Good luck, -Heikki On Thursday 18 August 2005 00:18, Tim Erwin wrote: > Thanks, Heikki, but I am trying to parse different IO objects such as > AlignIO, SeqIO and SearchIO, but what I am trying to do is guess the > format of any IO object and then use the appropriate parser. > > i.e If I have a unknown file output.out I want to guess the format and > then the appropriate IO parser to use. Is there a way to do this or > should I just test all the IO parsers with an eval block. > > Regards, > > Tim > > On Wed, 2005-08-17 at 10:03 +0100, Heikki Lehvaslaiho wrote: > > Tim, > > > > Bio::Tools::GuessSeqFormat is not meant to be used directly. It is called > > automatically by the constructor (new() method) of Bio::SeqIO: > > > > my $format = $param{'-format'} || > > $class->_guess_format( $param{-file} || $ARGV[0] ); > > > > if( ! $format ) { > > if ($param{-file}) { > > $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}|| > > $ARGV[0] )->guess; > > } elsif ($param{-fh}) { > > $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}|| > > $ARGV[0] )->guess; > > } > > } > > # ... code removed > > return "Bio::SeqIO::$format"->new(@args); > > > > The logic from the above code is as follows: > > > > 1. _guess_format() tries to determine the format of the file based on the > > filename extension. > > > > 2. Only if that fails try looking into the file/stream to guess the > > format using the Bio::Tools::GuessSeqFormat code. > > > > 3. The returned object is not a Bio::SeqIO but a Bio::SeqIO::$format > > object, which has the correct next_seq() and write_seq() methods. You can > > therefore use ref($seqoobject) to find out what parser is being used. > > > > > > > > The standard code for doing this should contain all the automation > > needed: > > > > foreach my $inputfilename (@all_files) { > > my $in = Bio::SeqIO->new(-file => $inputfilename); > > while ( my $seq = $in->next_seq() ) { > > # do something > > } > > } > > > > > > Yours, > > -Heikki > > > > On Wednesday 17 August 2005 08:15, Tim Erwin wrote: > > > Hi, > > > > > > Is there a way to determine which parser to use based on the guess from > > > Bio::Tools::GuessSeqFormat without hard coding a hash? I am interested > > > in parsing and storing various files to a database. > > > > > > I was wondering if it is a good idea to make a some extra functions so > > > that files could be parsed automatically. > > > > > > i.e for a fasta file > > > > > > my $obj = new Bio::Tools::GuessSeqFormat( -file => $filename ); > > > my $format = $obj->guess; > > > my $parser = $obj->parser; #RETURNS Bio::SeqIO > > > my $next_method = $obj->next_method; #RETURNS next_seq > > > my $write_method = $obj->write_method; #RETURNS write_seq > > > > > > #PARSE FILE > > > my $infile = new $parser(-file => $filename, -format => $format); > > > while (my $result = $infile->$next_method) { > > > > > > #DO STUFF HERE > > > #ADD $result TO DATABASE > > > > > > } > > > > > > Perhaps there is a better way to do this? Any suggestions would be > > > great. > > > > > > Regards, > > > > > > Tim > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From michael.watson at bbsrc.ac.uk Thu Aug 18 12:03:02 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Aug 18 11:59:46 2005 Subject: [Bioperl-l] Trouble with Bio::Graphics Message-ID: <8975119BCD0AC5419D61A9CF1A923E95024C4D1F@iahce2knas1.iah.bbsrc.reserved> Hi This is going to sound like a rather hard bug to track down but maybe someone can shed some light... I have a rather complicated script that takes a sequence, aligns it with other sequences, does some blast searching, then creates a whole load of features of the result for drawing with Bio::Graphics. I've used the script to create images of 2498 images.... But two fail... Both with the same error message, and this is it in it's entirety (there is no stack trace): Can't locate object method "primary _tag" via package "Bio::Location::Simple" at Bio/Graphics/processed_transcript.pm line 13, line 56. But of course it can't find that object method, and nor should it be... "processed_transcript" is my glyph of choice and as I said it has worked for 2498 of these jobs... But for some reason, on 2 of them, it's trying to find method primary_tag not on a feature object but on a location object. I am bemused. Any help appreciated. Mick From cain at cshl.edu Thu Aug 18 13:44:39 2005 From: cain at cshl.edu (Scott Cain) Date: Thu Aug 18 13:34:24 2005 Subject: [Bioperl-l] Trouble with Bio::Graphics In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95024C4D1F@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95024C4D1F@iahce2knas1.iah.bbsrc.reserved> Message-ID: <1124387080.3368.15.camel@localhost.localdomain> Mick, I don't really know, but I just wanted to clarify the error message that you put below. Is that a copy and paste, or did you retype it? The reason I ask is there are a few things that are odd about it that might point to a problem: - it mentions "primary _tag" with a space between the y and _. Of course there isn't a "primary _tag" method, as it is called "primary_tag". - It also references the file Bio/Graphics/processed_transcript.pm, but that is not the right file path for that file; it should be in Bio/Graphics/Glyph/processed_transcript.pm. Now, since you said this script works most of the time, these can't be fatal problems, but perhaps these are related to what the problem is. Scott On Thu, 2005-08-18 at 17:03 +0100, michael watson (IAH-C) wrote: > Hi > > This is going to sound like a rather hard bug to track down but maybe > someone can shed some light... > > I have a rather complicated script that takes a sequence, aligns it with > other sequences, does some blast searching, then creates a whole load of > features of the result for drawing with Bio::Graphics. > > I've used the script to create images of 2498 images.... But two fail... > Both with the same error message, and this is it in it's entirety (there > is no stack trace): > > Can't locate object method "primary _tag" via package > "Bio::Location::Simple" at Bio/Graphics/processed_transcript.pm line 13, > line 56. > > But of course it can't find that object method, and nor should it be... > "processed_transcript" is my glyph of choice and as I said it has worked > for 2498 of these jobs... But for some reason, on 2 of them, it's trying > to find method primary_tag not on a feature object but on a location > object. > > I am bemused. > > Any help appreciated. > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From michael.watson at bbsrc.ac.uk Thu Aug 18 13:48:04 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Aug 18 13:37:41 2005 Subject: [Bioperl-l] Trouble with Bio::Graphics Message-ID: <8975119BCD0AC5419D61A9CF1A923E9502067B1A@iahce2knas1.iah.bbsrc.reserved> Actually, more than likely they are transcription errors - I was running it on linux, my mail is on windows, sorry! Somewhere "->primary_tag" is being called on a Location object... -----Original Message----- From: Scott Cain [mailto:cain@cshl.edu] Sent: Thu 18/08/2005 6:44 PM To: michael watson (IAH-C) Cc: bioperl-l Subject: Re: [Bioperl-l] Trouble with Bio::Graphics Mick, I don't really know, but I just wanted to clarify the error message that you put below. Is that a copy and paste, or did you retype it? The reason I ask is there are a few things that are odd about it that might point to a problem: - it mentions "primary _tag" with a space between the y and _. Of course there isn't a "primary _tag" method, as it is called "primary_tag". - It also references the file Bio/Graphics/processed_transcript.pm, but that is not the right file path for that file; it should be in Bio/Graphics/Glyph/processed_transcript.pm. Now, since you said this script works most of the time, these can't be fatal problems, but perhaps these are related to what the problem is. Scott On Thu, 2005-08-18 at 17:03 +0100, michael watson (IAH-C) wrote: > Hi > > This is going to sound like a rather hard bug to track down but maybe > someone can shed some light... > > I have a rather complicated script that takes a sequence, aligns it with > other sequences, does some blast searching, then creates a whole load of > features of the result for drawing with Bio::Graphics. > > I've used the script to create images of 2498 images.... But two fail... > Both with the same error message, and this is it in it's entirety (there > is no stack trace): > > Can't locate object method "primary _tag" via package > "Bio::Location::Simple" at Bio/Graphics/processed_transcript.pm line 13, > line 56. > > But of course it can't find that object method, and nor should it be... > "processed_transcript" is my glyph of choice and as I said it has worked > for 2498 of these jobs... But for some reason, on 2 of them, it's trying > to find method primary_tag not on a feature object but on a location > object. > > I am bemused. > > Any help appreciated. > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From tembe at bioanalysis.org Thu Aug 18 16:57:32 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Thu Aug 18 16:47:20 2005 Subject: [Bioperl-l] Multiline Query Name in Blast Output Message-ID: <4304F63C.3040108@bioanalysis.org> Hello List, Is there any way to read multi-line query name from BLAST output using SearchIO? E.g., for the first few lines of BLAST output (attached at the end) the code : while($result = $in->next_result ) { print $result->query_name, "\n"; } prints only : Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 How to print the entire name that is in multiple lines? i.e., Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200:End_41274_Section1To75_Start:12_Length:36 Thanks. ------------------------ BLASTN 2.2.10 [Oct-19-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 1200:End_41274_Section1To75_Start:12_Length:36 (36 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 2,718,617 sequences; 12,254,801,043 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of 415... 59 1e-07 gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis str... 59 1e-07 gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92 complete... 59 1e-07 From jason.stajich at duke.edu Thu Aug 18 18:00:40 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Aug 18 17:50:37 2005 Subject: [Bioperl-l] Multiline Query Name in Blast Output In-Reply-To: <4304F63C.3040108@bioanalysis.org> References: <4304F63C.3040108@bioanalysis.org> Message-ID: <0F83D319-1999-4541-89A0-200C894E70B5@duke.edu> does the rest show up in $r->query_description ? -jason On Aug 18, 2005, at 4:57 PM, Waibhav Tembe wrote: > Hello List, > > Is there any way to read multi-line query name from BLAST output > using SearchIO? > > E.g., for the first few lines of BLAST output (attached at the end) > the code : > > while($result = $in->next_result ) { > print $result->query_name, "\n"; > } > > prints only : > Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 > > How to print the entire name that is in multiple lines? i.e., > Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200:En > d_41274_Section1To75_Start:12_Length:36 > > Thanks. > > ------------------------ > > BLASTN 2.2.10 [Oct-19-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database > search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 > 1200:End_41274_Section1To75_Start:12_Length:36 > (36 letters) > > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) > 2,718,617 sequences; 12,254,801,043 total letters > > Searching..................................................done > > > Score E > Sequences producing significant alignments: > (bits) Value > > gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of > 415... 59 1e-07 > gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis > str... 59 1e-07 > gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92 > complete... 59 1e-07 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Sun Aug 21 22:04:28 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Aug 21 21:53:48 2005 Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb Message-ID: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> Lincoln - I'm getting these warning when using the memory or berkeleydb adaptors: Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ Adaptor/memory/feature_serializer.pm line 17. This the line return join $;,@a; $; is not defined in my scripts, if I defining it warnings vanish, but did you mean ';' or do you want to provide a particular customizeable separator? Also I notice in the SYNOPSIS of berkeleydb you have this # do queries my $segment = $db->segment(Chromosome => '1R'); my $subseg = $segment->subseq(5000..6000); my @features = $subseg->features('gene'); Is there a reason to introduce a new subseq API since we already have $seq->subseq($start,$end) for Bio::PrimarySeqI? I didn't check to see, but I assume this is a convention you are using throughout Bio::DB::GFF? Hopefully start,end will work and I assume your usual start => end works too. Thanks, -jason -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sun Aug 21 23:20:01 2005 From: chen_li3 at yahoo.com (chen li) Date: Mon Aug 22 08:17:10 2005 Subject: [Bioperl-l] write sequence into file after stream query to database Message-ID: <20050822032001.20433.qmail@web30814.mail.mud.yahoo.com> Dear all, I am new to Bioperl. I wonder if anyone could help me out. After I do string query about nucleotides how should I write all the sequences into a file? Thanks, Li ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs From jason.stajich at duke.edu Mon Aug 22 08:31:45 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 22 08:21:26 2005 Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb In-Reply-To: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> Message-ID: <866DB003-A797-4E88-968D-69C9CF511127@duke.edu> On Aug 21, 2005, at 10:04 PM, Jason Stajich wrote: > Lincoln - > > I'm getting these warning when using the memory or berkeleydb > adaptors: > > Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ > Adaptor/memory/feature_serializer.pm line 17. > > This the line > return join $;,@a; > > $; is not defined in my scripts, if I defining it warnings vanish, > but did you mean ';' or do you want to provide a particular > customizeable separator? > Or perhaps the undef warnings are due to empty fields in @a, since features seem to be extracted out from the db later w/o incident. BTW - I think the 'memory' and 'berkeleydb' implementations effectively replace Bio::SeqFeature::Collection which also used a BDB Btree to store features/locations and make range queries, but not the full Bio::DB::GFF & DAS APIs. > Also I notice in the SYNOPSIS of berkeleydb you have this > # do queries > my $segment = $db->segment(Chromosome => '1R'); > my $subseg = $segment->subseq(5000..6000); > my @features = $subseg->features('gene'); > > > Is there a reason to introduce a new subseq API since we already > have $seq->subseq($start,$end) for Bio::PrimarySeqI? I didn't > check to see, but I assume this is a convention you are using > throughout Bio::DB::GFF? Hopefully start,end will work and I assume > your usual start => end works too. > > Thanks, > -jason > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From lstein at cshl.edu Mon Aug 22 10:59:42 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 22 10:50:00 2005 Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb In-Reply-To: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> Message-ID: <200508221059.44303.lstein@cshl.edu> On Sunday 21 August 2005 10:04 pm, Jason Stajich wrote: > Lincoln - > > I'm getting these warning when using the memory or berkeleydb adaptors: > > Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ > Adaptor/memory/feature_serializer.pm line 17. > > This the line > return join $;,@a; $; is a legacy Perl global variable that was used to separate elements of multidimensional arrays in the perl 4 days. It contains an infrequently-used control character, and since nobody is likely to change it I adopted it for quick serialization (much faster than freeze/thaw I found in my benchmarks). Your warnings are probably coming from undefined values in the @a array, and I think the best thing to do is to localize $^W around this area. I'll do that. > $; is not defined in my scripts, if I defining it warnings vanish, > but did you mean ';' or do you want to provide a particular > customizeable separator? > > Also I notice in the SYNOPSIS of berkeleydb you have this > # do queries > my $segment = $db->segment(Chromosome => '1R'); > my $subseg = $segment->subseq(5000..6000); > my @features = $subseg->features('gene'); > > > Is there a reason to introduce a new subseq API since we already have > $seq->subseq($start,$end) for Bio::PrimarySeqI? I didn't check to > see, but I assume this is a convention you are using throughout > Bio::DB::GFF? Hopefully start,end will work and I assume your usual > start => end works too. This is badness on my part. I'll fix that. My old habits from AcePerl keep sneaking in. Lincoln > > Thanks, > -jason > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From lstein at cshl.edu Mon Aug 22 10:59:42 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 22 10:50:49 2005 Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb In-Reply-To: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> Message-ID: <200508221059.44303.lstein@cshl.edu> On Sunday 21 August 2005 10:04 pm, Jason Stajich wrote: > Lincoln - > > I'm getting these warning when using the memory or berkeleydb adaptors: > > Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ > Adaptor/memory/feature_serializer.pm line 17. > > This the line > return join $;,@a; $; is a legacy Perl global variable that was used to separate elements of multidimensional arrays in the perl 4 days. It contains an infrequently-used control character, and since nobody is likely to change it I adopted it for quick serialization (much faster than freeze/thaw I found in my benchmarks). Your warnings are probably coming from undefined values in the @a array, and I think the best thing to do is to localize $^W around this area. I'll do that. > $; is not defined in my scripts, if I defining it warnings vanish, > but did you mean ';' or do you want to provide a particular > customizeable separator? > > Also I notice in the SYNOPSIS of berkeleydb you have this > # do queries > my $segment = $db->segment(Chromosome => '1R'); > my $subseg = $segment->subseq(5000..6000); > my @features = $subseg->features('gene'); > > > Is there a reason to introduce a new subseq API since we already have > $seq->subseq($start,$end) for Bio::PrimarySeqI? I didn't check to > see, but I assume this is a convention you are using throughout > Bio::DB::GFF? Hopefully start,end will work and I assume your usual > start => end works too. This is badness on my part. I'll fix that. My old habits from AcePerl keep sneaking in. Lincoln > > Thanks, > -jason > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From lstein at cshl.edu Mon Aug 22 11:01:03 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 22 10:51:37 2005 Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb In-Reply-To: <866DB003-A797-4E88-968D-69C9CF511127@duke.edu> References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> <866DB003-A797-4E88-968D-69C9CF511127@duke.edu> Message-ID: <200508221101.04634.lstein@cshl.edu> On Monday 22 August 2005 08:31 am, Jason Stajich wrote: > On Aug 21, 2005, at 10:04 PM, Jason Stajich wrote: > > Lincoln - > > > > I'm getting these warning when using the memory or berkeleydb > > adaptors: > > > > Use of uninitialized value in join or string at [MYPATH]/Bio/DB/GFF/ > > Adaptor/memory/feature_serializer.pm line 17. > > > > This the line > > return join $;,@a; > > > > $; is not defined in my scripts, if I defining it warnings vanish, > > but did you mean ';' or do you want to provide a particular > > customizeable separator? > > Or perhaps the undef warnings are due to empty fields in @a, since > features seem to be extracted out from the db later w/o incident. > > BTW - I think the 'memory' and 'berkeleydb' implementations > effectively replace Bio::SeqFeature::Collection which also used a BDB > Btree to store features/locations and make range queries, but not the > full Bio::DB::GFF & DAS APIs. Ouch! I apologize if I stepped on your (and anyone else's) foot! Lincoln > > > Also I notice in the SYNOPSIS of berkeleydb you have this > > # do queries > > my $segment = $db->segment(Chromosome => '1R'); > > my $subseg = $segment->subseq(5000..6000); > > my @features = $subseg->features('gene'); > > > > > > Is there a reason to introduce a new subseq API since we already > > have $seq->subseq($start,$end) for Bio::PrimarySeqI? I didn't > > check to see, but I assume this is a convention you are using > > throughout Bio::DB::GFF? Hopefully start,end will work and I assume > > your usual start => end works too. > > > > Thanks, > > -jason > > -- > > Jason Stajich > > Duke University > > http://www.duke.edu/~jes12 > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From lstein at cshl.edu Mon Aug 22 11:04:37 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 22 10:54:12 2005 Subject: [Bioperl-l] Bio::DB::GFF::Adaptor::berkeleydb In-Reply-To: <866DB003-A797-4E88-968D-69C9CF511127@duke.edu> References: <84B80671-86D1-4AD7-B771-98FEDD6E7F40@duke.edu> <866DB003-A797-4E88-968D-69C9CF511127@duke.edu> Message-ID: <200508221104.38633.lstein@cshl.edu> > > my $subseg = $segment->subseq(5000..6000); Actually this is just a typo. The .. should be a comma. Fixing. Lincoln > > my @features = $subseg->features('gene'); > > > > > > Is there a reason to introduce a new subseq API since we already > > have $seq->subseq($start,$end) for Bio::PrimarySeqI? I didn't > > check to see, but I assume this is a convention you are using > > throughout Bio::DB::GFF? Hopefully start,end will work and I assume > > your usual start => end works too. > > > > Thanks, > > -jason > > -- > > Jason Stajich > > Duke University > > http://www.duke.edu/~jes12 > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From tembe at bioanalysis.org Mon Aug 22 11:50:39 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Mon Aug 22 11:40:16 2005 Subject: [Bioperl-l] Multiline Query Name in Blast Output In-Reply-To: <0F83D319-1999-4541-89A0-200C894E70B5@duke.edu> References: <4304F63C.3040108@bioanalysis.org> <0F83D319-1999-4541-89A0-200C894E70B5@duke.edu> Message-ID: <4309F44F.9050701@bioanalysis.org> Thanks Jason. The rest showed up in $r->description. Is there any reason for this? Does BioPerl assume that first whit-space character separates the query name and description? Jason Stajich wrote: > does the rest show up in $r->query_description ? > > -jason > On Aug 18, 2005, at 4:57 PM, Waibhav Tembe wrote: > >> Hello List, >> >> Is there any way to read multi-line query name from BLAST output >> using SearchIO? >> >> E.g., for the first few lines of BLAST output (attached at the end) >> the code : >> >> while($result = $in->next_result ) { >> print $result->query_name, "\n"; >> } >> >> prints only : >> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 >> >> How to print the entire name that is in multiple lines? i.e., >> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200:En >> d_41274_Section1To75_Start:12_Length:36 >> >> Thanks. >> >> ------------------------ >> >> BLASTN 2.2.10 [Oct-19-2004] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >> "Gapped BLAST and PSI-BLAST: a new generation of protein database >> search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> Query= Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 >> 1200:End_41274_Section1To75_Start:12_Length:36 >> (36 letters) >> >> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, >> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) >> 2,718,617 sequences; 12,254,801,043 total letters >> >> Searching..................................................done >> >> >> Score E >> Sequences producing significant alignments: >> (bits) Value >> >> gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of >> 415... 59 1e-07 >> gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis >> str... 59 1e-07 >> gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92 >> complete... 59 1e-07 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > From jason.stajich at duke.edu Mon Aug 22 11:57:20 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 22 11:46:59 2005 Subject: [Bioperl-l] Multiline Query Name in Blast Output In-Reply-To: <4309F44F.9050701@bioanalysis.org> References: <4304F63C.3040108@bioanalysis.org> <0F83D319-1999-4541-89A0-200C894E70B5@duke.edu> <4309F44F.9050701@bioanalysis.org> Message-ID: <4C522C85-E177-43BE-B38B-76ABA4A5D5F0@duke.edu> exactly. The line-wrap (\n) is considered whitespace as well so hence the separation. On Aug 22, 2005, at 11:50 AM, Waibhav Tembe wrote: > Thanks Jason. The rest showed up in $r->description. Is there any > reason for this? > > Does BioPerl assume that first whit-space character separates the > query name and description? > > > > Jason Stajich wrote: > > >> does the rest show up in $r->query_description ? >> >> -jason >> On Aug 18, 2005, at 4:57 PM, Waibhav Tembe wrote: >> >> >>> Hello List, >>> >>> Is there any way to read multi-line query name from BLAST output >>> using SearchIO? >>> >>> E.g., for the first few lines of BLAST output (attached at the >>> end) the code : >>> >>> while($result = $in->next_result ) { >>> print $result->query_name, "\n"; >>> } >>> >>> prints only : >>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 >>> >>> How to print the entire name that is in multiple lines? i.e., >>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_41200: >>> En d_41274_Section1To75_Start:12_Length:36 >>> >>> Thanks. >>> >>> ------------------------ >>> >>> BLASTN 2.2.10 [Oct-19-2004] >>> >>> >>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>> Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>> "Gapped BLAST and PSI-BLAST: a new generation of protein >>> database search >>> programs", Nucleic Acids Res. 25:3389-3402. >>> >>> Query= >>> Cand_Start_41225:End_41249:Length_25:Extended_Length_75:Start_4 >>> 1200:End_41274_Section1To75_Start:12_Length:36 >>> (36 letters) >>> >>> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, >>> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) >>> 2,718,617 sequences; 12,254,801,043 total letters >>> >>> Searching..................................................done >>> >>> >>> Score E >>> Sequences producing significant alignments: >>> (bits) Value >>> >>> gi|21956769|gb|AE013611.1| Yersinia pestis KIM section 11 of >>> 415... 59 1e-07 >>> gi|45434720|gb|AE017127.1| Yersinia pestis biovar Medievalis >>> str... 59 1e-07 >>> gi|15978115|emb|AJ414141.1| Yersinia pestis strain CO92 >>> complete... 59 1e-07 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gnf.org Mon Aug 22 14:18:30 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Aug 22 14:09:37 2005 Subject: [Bioperl-l] Re: [BioSQL-l] loading fasta records with load_seqdatabase.pl - correct fasta headers In-Reply-To: <3cfaa40405082207574597e9f9@mail.gmail.com> References: <3cfaa40405082207574597e9f9@mail.gmail.com> Message-ID: Amit, this is a problem inherent with the fasta format as there is no precise definition of what to put as identifier and/or accession. The Bioperl fasta parser doesn't set the accession and so it defaults to "unknown" (it cannot be undef). Since fasta format also doesn't have the version in a defined place, the version will be undef (i.e., zero for biosql) for every entry, so that all your sequences will have the same unique key of (accession,version,namespace) which violates the constraint after the first sequence was stored. The easiest way to deal with this is to write your own SequenceProcessor (see Bio::Factory::SequenceProcessorI and Bio::Seq::BaseSeqProcessor) and then pipeline it using the --pipeline argument to load_seqdatabase.pl. Simple examples for how to write your own SeqProcessor have been posted before, e.g., by Marc Logghe: http://portal.open-bio.org/pipermail/bioperl-l/2005-February/018158.html and by myself http://portal.open-bio.org/pipermail/bioperl-l/2003-June/012369.html -hilmar On Aug 22, 2005, at 7:57 AM, Amit Indap wrote: > Hi, > > I am new to using the biosql. I am trying to load fasta formatted > RefSeq records into the biosql schema. When I try to use the > load_seqdatabase.pl script I get the following error > > load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql > --namespace refseq --format fasta refseq.fa > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were > ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknown > ","PREDICTED: > Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,) > Duplicate entry 'unknown-1-0' for key 2 > --------------------------------------------------- > Could not store unknown: > ------------- EXCEPTION ------------- > MSG: You're trying to lie about the length: is 1316 but you say 6474 > STACK Bio::PrimarySeq::length > /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: > 553 > STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: > 553 > STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1310 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:976 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284 > STACK Bio::DB::BioSQL::SeqAdaptor::attach_children > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1341 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:976 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: > 272 > STACK (eval) ./load_seqdatabase.pl:542 > STACK toplevel ./load_seqdatabase.pl:525 > > -------------------------------------- > at ./load_seqdatabase.pl line 555 > > I think my fasta headers are incorrect since it says it cannot store > unknown. The first fasta record in my refseq.fa is this: > >> gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin > domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E), > mRNA > > Do I need to reformat that header? I downloaded the NM series of > Refseqs in fasta form from NCBI's ftp site and wanted to load them > into the biosql schema. > > Thanks, > > Amit Indap > Dept. of Biological Statistics and Computational Biology > Cornell University > > > (error message) > Loading refseq.fa ... > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From kynn at panix.com Mon Aug 22 15:30:10 2005 From: kynn at panix.com (kynn@panix.com) Date: Mon Aug 22 15:20:18 2005 Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs Message-ID: <200508221930.j7MJUAd02770@panix3.panix.com> Hi! Does anyone know of a high-level guide to the various Human genomics databases? Basically, I want to make my software as flexible as possible regarding the kinds of human gene identifiers it will recognize, but I'm having a hard time figuring out the various naming schemes. I'm (vaguely) aware of Entrez IPI Ensembl SwissProt/UniProt RefSeq LocusLink "gene symbols" (e.g. GHC1_HUMAN) "gi numbers" though I'm not sure these are strictly comparable. (Am I missing any major ones?) >From what I've seen at sites like http://harvester.embl.de it looks like a huge mess, but maybe this is only a reflection of my ignorance. Any pointers would be much appreciated! kj From skirov at utk.edu Mon Aug 22 16:28:19 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Aug 22 16:19:19 2005 Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs In-Reply-To: <200508221930.j7MJUAd02770@panix3.panix.com> References: <200508221930.j7MJUAd02770@panix3.panix.com> Message-ID: <430A3563.6010601@utk.edu> You certainly have missed HGNC (http://www.gene.ucl.ac.uk/nomenclature/) and HUGO in particular. By the way EntrezGene substituted LocusLink few months ago... Affy mapping is also useful for some things. Aceview (http://www.ncbi.nih.gov/IEB/Research/Acembly/index.html?human), MIM (Mendelian Inheritance in Man, though it is not just human anymore...) and GRIF (Gene Reference Into Function) are also of some relevance I guess, though you can access these through EntrezGene. There are bunch of others more specialized DBs out there as well, depends what you want to do. By the way you can search for most relationships you have mentioned through EnsMART and BioMART or through GeneKeyDB (genereg.ornl.gov/gkdb), the last being developed by me so this is not unbiased :-) ... Stefan kynn@panix.com wrote: >Hi! Does anyone know of a high-level guide to the various Human >genomics databases? Basically, I want to make my software as flexible >as possible regarding the kinds of human gene identifiers it will >recognize, but I'm having a hard time figuring out the various naming >schemes. I'm (vaguely) aware of > > Entrez > IPI > Ensembl > SwissProt/UniProt > RefSeq > LocusLink > "gene symbols" (e.g. GHC1_HUMAN) > "gi numbers" > >though I'm not sure these are strictly comparable. (Am I missing any >major ones?) > >>From what I've seen at sites like http://harvester.embl.de it looks >like a huge mess, but maybe this is only a reflection of my ignorance. >Any pointers would be much appreciated! > >kj > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From horvathm at niehs.nih.gov Mon Aug 22 17:32:53 2005 From: horvathm at niehs.nih.gov (Horvath, Monica (NIH/NIEHS)) Date: Mon Aug 22 17:23:07 2005 Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs Message-ID: <8ECE8333845072439B4CBE35D1814ACC04F19DA3@nihexchange22.nih.gov> If I were you, I would find yourself a copy of the database issue of NAR-- usually in January of each year. Also, I would check out all of the gene symbol/mappings provided as options (e.g. ensmart) or tables within the ensembl and ucsc genome browser systems-- this would give you an idea of the type of mappings typically desired by biologists. I would make your application as flexible as possible to accept additional code portions to accommodate new identification schemes because this stuff changes constantly. Monica M. Horvath, Ph.D. Laboratory of Molecular Genetics Environmental Genomics Group 111 T.W. Alexander Drive P.O. Box 12233 MD C3-03 Research Triange Park, NC 27709-2333 +1 919-541-3266 -----Original Message----- From: kynn@panix.com [mailto:kynn@panix.com] Sent: Monday, August 22, 2005 3:30 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] [OT] ISO guide to human genomics DBs Hi! Does anyone know of a high-level guide to the various Human genomics databases? Basically, I want to make my software as flexible as possible regarding the kinds of human gene identifiers it will recognize, but I'm having a hard time figuring out the various naming schemes. I'm (vaguely) aware of Entrez IPI Ensembl SwissProt/UniProt RefSeq LocusLink "gene symbols" (e.g. GHC1_HUMAN) "gi numbers" though I'm not sure these are strictly comparable. (Am I missing any major ones?) >From what I've seen at sites like http://harvester.embl.de it looks like a huge mess, but maybe this is only a reflection of my ignorance. Any pointers would be much appreciated! kj _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Mon Aug 22 18:18:07 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 22 18:07:45 2005 Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta? In-Reply-To: <1124126549.2868.2.camel@localhost.localdomain> References: <1124116511.2891.9.camel@localhost.localdomain> <1124126549.2868.2.camel@localhost.localdomain> Message-ID: <200508221818.08032.lstein@cshl.edu> I've just looked into this. The bug occurs when Windows opens the FASTA file in text mode rather than binary mode; when in text mode the "\r\n" sequence is invisibly mapped to "\n" during readline operations, so Bio::DB::Fasta thinks that it is dealing with a Unix-format file; then when the module tries to seek() to the proper line number, Windows doesn't do the line end mapping, so it seeks to the wrong offset. (sound of hairs being pulled) I've fixed the problem by explicitly calling binmode() on all filehandles that Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS and the gbrowse 1.63 CVS version. It ought to fix Chris' GC content weirdness. Lincoln On Monday 15 August 2005 01:22 pm, Scott Cain wrote: > Just to follow up on my own email with a little more information: in > Fasta.pm, line 697: > > $termination_length ||= /\r\n$/ ? 2 : 1; # account for crlf-terminated > Windows files > > The pattern match is failing on DOS formatted files; I don't know why. > Does anyone else? > > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote: > > Hello all, > > > > I am investigating a bug in GBrowse that seems to only surface when > > people are using the memory (ie, file) adaptor on Windows systems. > > Here's the bug report: > > > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&grou > >p_id=27707 > > > > I've tracked the problem down to Bio::DB::Fasta when the file is dos > > formatted (that is, it has both line feeds and carriage returns), BDF > > returns the wrong string when a subsequence is requested, but when the > > file is unix formatted (ie only CR (or is it only LF?)), it returns the > > right string. I wrote the very simple test script below and stepped it > > through the perl debugger. It looks like the bug is in the caloffset > > method, as it returns the same offsets regardless of the file type, > > which then makes the subsequent seek into the file go to the wrong > > coordinates of dos formatted files. > > > > Unfortunately, I don't really know what is going on caloffset, so I > > don't know how to fix it, but it presumably has to check the format of > > the file somewhere and take that into account. > > > > Thanks, > > Scott -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From ureddi at emich.edu Mon Aug 22 21:32:32 2005 From: ureddi at emich.edu (Usha Rani Reddi) Date: Mon Aug 22 21:22:02 2005 Subject: [Bioperl-l] bl2seq Message-ID: Hi, I am trying to compare two hundred thousand probes(each one of them) to another genome. Format of the file containing probes is like this SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA What I am trying to do is compare PROBE_SEQUENCE to fasta file of Streptococcus agalactiae. I am trying to loop through the probes but not sure how to proceed. My program is working fine for single sequence. One more thing is I am not interested in matches, I want to display only mismatches. I am new to Bioperl, some one please help me with this. Thanks Usha From james.wasmuth at ed.ac.uk Tue Aug 23 04:13:56 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Aug 23 04:22:26 2005 Subject: [Bioperl-l] bl2seq In-Reply-To: References: Message-ID: <430ADAC4.6090601@ed.ac.uk> Hi Usha, How new are you to Perl? I would turn these probe sequences into a fasta file using Bio::SeqIO. Use this as the input file for a normal blast search. Then search the blast output using Bio::SearchIO. The best way to learn is with the HOWTOs: http://bioperl.org/HOWTOs/html/SeqIO.html http://bioperl.org/HOWTOs/html/SearchIO.html any problems? Post back to the list. hope this helps james Usha Rani Reddi wrote: >Hi, >I am trying to compare two hundred thousand probes(each one of them) to >another genome. Format of the file containing probes is like this >SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE >NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC >NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT >NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG >NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC >NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA >NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG >NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG >NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA >What I am trying to do is compare PROBE_SEQUENCE to fasta file of >Streptococcus agalactiae. I am trying to loop through the probes but not >sure how to proceed. My program is working fine for single sequence. One >more thing is I am not interested in matches, I want to display only >mismatches. I am new to Bioperl, some one please help me with this. >Thanks >Usha >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- "You have made your way from worm to man, and much in you is still worm." Friedrich Nietzsche, Thus Spoke Zarathustra Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org/~james Edinburgh | EH9 3JT | UK | From palmeida at igc.gulbenkian.pt Tue Aug 23 04:41:09 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Tue Aug 23 04:33:27 2005 Subject: [Bioperl-l] bl2seq In-Reply-To: References: Message-ID: <200508230941.09888.palmeida@igc.gulbenkian.pt> Hi Usha, Perhaps something like this: my @seqs; open IN, "yourfile"; while () { chomp(); #get rid of newline character my @line = split(/\s+/); push @seqs, $line[3] if $line[3] =~ /^A-Z+$/; } foreach my $seq (@seqs) { #Do whatever you are doing successfully for a single sequence } I'm not sure about the syntax. because I haven't been using Perl, but that's the general idea. -Paulo On Tuesday 23 August 2005 02:32, Usha Rani Reddi wrote: > Hi, > I am trying to compare two hundred thousand probes(each one of them) to > another genome. Format of the file containing probes is like this > SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE > NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC > NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT > NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG > NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC > NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA > NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG > NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG > NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA > What I am trying to do is compare PROBE_SEQUENCE to fasta file of > Streptococcus agalactiae. I am trying to loop through the probes but not > sure how to proceed. My program is working fine for single sequence. One > more thing is I am not interested in matches, I want to display only > mismatches. I am new to Bioperl, some one please help me with this. > Thanks > Usha > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Paulo Almeida Tel: +351 21 4464635, Fax: +351 21 4407970 Instituto Gulbenkian de Ci?ncia Rua da Quinta Grande, 6 P-2780-156 Oeiras Portugal http://www.igc.gulbenkian.pt From mark.schreiber at novartis.com Tue Aug 23 04:53:21 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Aug 23 04:42:59 2005 Subject: [Bioperl-l] Re: [BioSQL-l] loading fasta records with load_seqdatabase.pl - correct fasta headers Message-ID: The NCBI 'standard' is to format the header like this: >gi|{identifier}|{namespace}|{accession}.{version}|{accession} description eg >gi|123456|gb|AE657483.3|AE657483.3 Gene of interest from Flying Spaghetti Monster. Biojava is going to be adopting this approach when the appropriate information is available. - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Hilmar Lapp Sent by: biosql-l-bounces@portal.open-bio.org 08/23/2005 02:18 AM To: Amit Indap cc: Bioperl , Biosql , (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [BioSQL-l] loading fasta records with load_seqdatabase.pl - correct fasta headers Amit, this is a problem inherent with the fasta format as there is no precise definition of what to put as identifier and/or accession. The Bioperl fasta parser doesn't set the accession and so it defaults to "unknown" (it cannot be undef). Since fasta format also doesn't have the version in a defined place, the version will be undef (i.e., zero for biosql) for every entry, so that all your sequences will have the same unique key of (accession,version,namespace) which violates the constraint after the first sequence was stored. The easiest way to deal with this is to write your own SequenceProcessor (see Bio::Factory::SequenceProcessorI and Bio::Seq::BaseSeqProcessor) and then pipeline it using the --pipeline argument to load_seqdatabase.pl. Simple examples for how to write your own SeqProcessor have been posted before, e.g., by Marc Logghe: http://portal.open-bio.org/pipermail/bioperl-l/2005-February/018158.html and by myself http://portal.open-bio.org/pipermail/bioperl-l/2003-June/012369.html -hilmar On Aug 22, 2005, at 7:57 AM, Amit Indap wrote: > Hi, > > I am new to using the biosql. I am trying to load fasta formatted > RefSeq records into the biosql schema. When I try to use the > load_seqdatabase.pl script I get the following error > > load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql > --namespace refseq --format fasta refseq.fa > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were > ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknown > ","PREDICTED: > Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,) > Duplicate entry 'unknown-1-0' for key 2 > --------------------------------------------------- > Could not store unknown: > ------------- EXCEPTION ------------- > MSG: You're trying to lie about the length: is 1316 but you say 6474 > STACK Bio::PrimarySeq::length > /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: > 553 > STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: > 553 > STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1310 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:976 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284 > STACK Bio::DB::BioSQL::SeqAdaptor::attach_children > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1341 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:976 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: > 272 > STACK (eval) ./load_seqdatabase.pl:542 > STACK toplevel ./load_seqdatabase.pl:525 > > -------------------------------------- > at ./load_seqdatabase.pl line 555 > > I think my fasta headers are incorrect since it says it cannot store > unknown. The first fasta record in my refseq.fa is this: > >> gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin > domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E), > mRNA > > Do I need to reformat that header? I downloaded the NM series of > Refseqs in fasta form from NCBI's ftp site and wanted to load them > into the biosql schema. > > Thanks, > > Amit Indap > Dept. of Biological Statistics and Computational Biology > Cornell University > > > (error message) > Loading refseq.fa ... > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From ureddi at emich.edu Tue Aug 23 07:50:36 2005 From: ureddi at emich.edu (Usha Rani Reddi) Date: Tue Aug 23 07:40:05 2005 Subject: [Bioperl-l] Local bl2seq Message-ID: Hi, I am trying to use BLAST to compare the sequences. I did the program in Bioperl. Below is my piece of code use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; $seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta", -format => "fasta" ); $seq_obj = $seqio_obj->next_seq; $input2 = Bio::Seq->new(-id=>"testquery2", -seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca"); $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn','outfile' => 'bl2seq1.out'); $blast_report = $factory->bl2seq($seq_obj, $input2); I need help for looping input2. I want to extract this part of sequence from a file containing 200000 records. Using perl I am extracting the sequence part for file of format given below. SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG code for extracting PROBE_SEQUENCE looks like this $NemSeq =; chomp $NemSeq; unless (open(seqfile, $NemSeq)) { print "Cannot open file \n"; exit; } @NemSeq = ; close seqfile; for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) { #print $k, $NemSeq[$k]; @Nem =split(/\t/,$NemSeq[$k]); $input= $Nem[3]; #print scalar(@Nem); #print $Nem[3], "\n"; } @Nem =split(/\t/,$NemSeq) $input2 = substr(@NemSeq,4,32); So far I could successfully use bioperl(bl2seq) to compare whole genome with single probe. I want to compare all the 200000 thousand probes. I am interested only in mismatches, for this particular scenario my assumption is that more than 90% of them will match. I want to send only the mismatches to output file and discard the matches. I would like to classify the mismatches based on the percentage dissimilarity, is there a way in Bioperl for this? Thanks a lot for the reply. Please help me with this. Thanks Usha ----- Original Message ----- From: Barry Moore Date: Monday, August 22, 2005 11:45 pm Subject: Re: [Bioperl-l] bl2seq > Usha, > > The best advice I can give you is that you need to focus your > question a > bit more. What method are you using to compare your probe to your > fasta? Regex, BLAST, Needle, RNAHybrid...? You say your sequence > is > working fine for single sequence. Are you using Bioperl for that? > Can > you tell us exactly what isn't working for you or what questions > you > have about working with multiple sequences? Are you already using > Bioperl with your single sequence comparison? Can you show us some > code? > Barry > > Usha Rani Reddi wrote: > > >Hi, > >I am trying to compare two hundred thousand probes(each one of > them) to > >another genome. Format of the file containing probes is like this > >SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE > >NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC > >NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT > >NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG > >NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC > >NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA > >NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG > >NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG > >NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA > >What I am trying to do is compare PROBE_SEQUENCE to fasta file of > >Streptococcus agalactiae. I am trying to loop through the probes > but not > >sure how to proceed. My program is working fine for single > sequence. One > >more thing is I am not interested in matches, I want to display only > >mismatches. I am new to Bioperl, some one please help me with this. > >Thanks > >Usha > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > > From bmoore at genetics.utah.edu Tue Aug 23 14:40:51 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Aug 23 14:28:10 2005 Subject: [Bioperl-l] RE: Local bl2seq Message-ID: Usha- I think the code below will wrap your existing code in the loop you need. You will want to get a copy of a good perl programming book like Programming Perl from O'Reilly. It will help you out with all those little perl details like loop structures etc. Barry #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta", -format => "fasta" ); my $seq_obj = $seqio_obj->next_seq; open (IN, " location/of/your/probe/file") or die "Can't open IN"; while (my $row = ) { chomp $row; #Assuming your file is tab delimited... my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/, $row; my $input2 = Bio::Seq->new(-id=>"testquery2", -seq=> $probe_sequence ); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', 'outfile' => 'bl2seq1.out'); my $blast_report = $factory->bl2seq($seq_obj, $input2); #Here is where you want to throw out good matches. You'll need to determine #what method you want to do that. Maybe since you want there to be no good #hits you would just call $blast_report->max_significance and make sure it's #value is too high to be significant. if ($blast_report->max_significance > 0.01) { print "$row\n"; } } -----Original Message----- From: Usha Rani Reddi [mailto:ureddi@emich.edu] Sent: Tuesday, August 23, 2005 5:51 AM To: Barry Moore Cc: bioperl-l@portal.open-bio.org Subject: Local bl2seq Hi, I am trying to use BLAST to compare the sequences. I did the program in Bioperl. Below is my piece of code use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; $seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta", -format => "fasta" ); $seq_obj = $seqio_obj->next_seq; $input2 = Bio::Seq->new(-id=>"testquery2", -seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca"); $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn','outfile' => 'bl2seq1.out'); $blast_report = $factory->bl2seq($seq_obj, $input2); I need help for looping input2. I want to extract this part of sequence from a file containing 200000 records. Using perl I am extracting the sequence part for file of format given below. SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG code for extracting PROBE_SEQUENCE looks like this $NemSeq =; chomp $NemSeq; unless (open(seqfile, $NemSeq)) { print "Cannot open file \n"; exit; } @NemSeq = ; close seqfile; for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) { #print $k, $NemSeq[$k]; @Nem =split(/\t/,$NemSeq[$k]); $input= $Nem[3]; #print scalar(@Nem); #print $Nem[3], "\n"; } @Nem =split(/\t/,$NemSeq) $input2 = substr(@NemSeq,4,32); So far I could successfully use bioperl(bl2seq) to compare whole genome with single probe. I want to compare all the 200000 thousand probes. I am interested only in mismatches, for this particular scenario my assumption is that more than 90% of them will match. I want to send only the mismatches to output file and discard the matches. I would like to classify the mismatches based on the percentage dissimilarity, is there a way in Bioperl for this? Thanks a lot for the reply. Please help me with this. Thanks Usha ----- Original Message ----- From: Barry Moore Date: Monday, August 22, 2005 11:45 pm Subject: Re: [Bioperl-l] bl2seq > Usha, > > The best advice I can give you is that you need to focus your > question a > bit more. What method are you using to compare your probe to your > fasta? Regex, BLAST, Needle, RNAHybrid...? You say your sequence > is > working fine for single sequence. Are you using Bioperl for that? > Can > you tell us exactly what isn't working for you or what questions > you > have about working with multiple sequences? Are you already using > Bioperl with your single sequence comparison? Can you show us some > code? > Barry > > Usha Rani Reddi wrote: > > >Hi, > >I am trying to compare two hundred thousand probes(each one of > them) to > >another genome. Format of the file containing probes is like this > >SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE > >NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC > >NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT > >NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG > >NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC > >NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA > >NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG > >NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG > >NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA > >What I am trying to do is compare PROBE_SEQUENCE to fasta file of > >Streptococcus agalactiae. I am trying to loop through the probes > but not > >sure how to proceed. My program is working fine for single > sequence. One > >more thing is I am not interested in matches, I want to display only > >mismatches. I am new to Bioperl, some one please help me with this. > >Thanks > >Usha > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > > From agathman at semo.edu Tue Aug 23 14:38:42 2005 From: agathman at semo.edu (Gathman, Allen) Date: Tue Aug 23 14:29:16 2005 Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq Message-ID: <33580922CBEEC846B473BAE124985DE00514DB69@xchgnt.semo.edu> Hi, BioPerl gurus: Although this question involves a Gbrowse database, I think it's actually a BioPerl question at heart - and in any case it appears that there's a lot of overlap between the people who answer questions in both lists, so I'm guessing this is a good place for this question. I've written a script that finds particular pfam hits in a GBROWSE database, then uses "overlapping_features" to find predicted gene features of type "transcript:GLEAN" that overlap those pfams. I've set the aggregator "transcript" as {CDS/mRNA} already. I select features using a regular expression to choose particular names, then I use spliced_seq to return the spliced CDS of each feature - but I'm only getting back the CDS that actually overlap the pfam hit, not the full predicted gene. So my question is, what do I need to do in order to get ALL the CDS of each predicted gene feature spliced together, instead of only the ones that actually overlap the pfam hit I used to select that predicted gene? Thanks in advance for any help you can give... Here's the code: #!/usr/bin/perl use strict; use Bio::DB::GFF; use Bio::Seq; use Bio::SeqIO; use Getopt::Long; my $outfile; GetOptions( 'o|outfile=s' => \$outfile, ); my $outfa= Bio::SeqIO -> new (-file => ">$outfile", -format => 'Fasta' ); my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=cc;host=localhost', -fasta => '/gbrowse/databases/cc' ); $db->add_aggregator('transcript{CDS/mRNA}'); for (my $i =1; $i<=20; $i++){ my $pfamname="Peptidase_C$i"; my @pfams = $db->get_feature_by_name( Domain => $pfamname); foreach my $pfamhit (@pfams){ my $desc = $pfamname; my $score=$pfamhit->score; my $name = $pfamhit->name; $desc.= " $score "; $desc.= $pfamhit->location->seq_id(); $desc.= ": "; # # Here's where I'm selecting predicted genes that overlap the Pfam hit # my @genes = $db -> overlapping_features( -refseq => $pfamhit->location->seq_id, -start => $pfamhit->start, -stop => $pfamhit->stop, -types =>'transcript:GLEAN' ); # # Now I'm choosing the ones with names I want out of the selected genes # foreach my $gene (@genes){ my $gid=$gene->display_id(); if ($gid =~/aug_GLEAN/){ $desc.=$gene->start; $desc.=" - "; $desc.=$gene->stop; # # Here I'm splicing the gene, tacking on a description, and outputting it. # my $splseq = $gene->spliced_seq(); $splseq->desc($desc); $splseq->display_id($gid); $outfa->write_seq($splseq); }# end if aug_GLEAN }# end foreach gene }# end foreach pfamhit } # end for numbers close OUT; Allen Gathman http://cstl-csm.semo.edu/gathman From hlapp at gnf.org Tue Aug 23 15:43:56 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Aug 23 15:32:35 2005 Subject: [Bioperl-l] Re: [BioSQL-l] loading fasta records with load_seqdatabase.pl - correct fasta headers In-Reply-To: References: Message-ID: <965b72e4c118ef3b937c7d3464b2ecd1@gnf.org> I guess it may be worth to deposit a suitable SeqProcessor for this type of ID in the repository as probably many people may find it useful. On Aug 23, 2005, at 1:53 AM, mark.schreiber@novartis.com wrote: > The NCBI 'standard' is to format the header like this: > >> gi|{identifier}|{namespace}|{accession}.{version}|{accession} >> description > > eg > >> gi|123456|gb|AE657483.3|AE657483.3 Gene of interest from Flying >> Spaghetti > Monster. > > Biojava is going to be adopting this approach when the appropriate > information is available. > > - Mark > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > Hilmar Lapp > Sent by: biosql-l-bounces@portal.open-bio.org > 08/23/2005 02:18 AM > > > To: Amit Indap > cc: Bioperl , Biosql > , (bcc: > Mark Schreiber/GP/Novartis) > Subject: Re: [BioSQL-l] loading fasta records with > load_seqdatabase.pl - correct > fasta headers > > > Amit, > > this is a problem inherent with the fasta format as there is no precise > definition of what to put as identifier and/or accession. The Bioperl > fasta parser doesn't set the accession and so it defaults to "unknown" > (it cannot be undef). Since fasta format also doesn't have the version > in a defined place, the version will be undef (i.e., zero for biosql) > for every entry, so that all your sequences will have the same unique > key of (accession,version,namespace) which violates the constraint > after the first sequence was stored. > > The easiest way to deal with this is to write your own > SequenceProcessor (see Bio::Factory::SequenceProcessorI and > Bio::Seq::BaseSeqProcessor) and then pipeline it using the --pipeline > argument to load_seqdatabase.pl. > > Simple examples for how to write your own SeqProcessor have been posted > before, e.g., by Marc Logghe: > > http://portal.open-bio.org/pipermail/bioperl-l/2005-February/ > 018158.html > > and by myself > > http://portal.open-bio.org/pipermail/bioperl-l/2003-June/012369.html > > -hilmar > > On Aug 22, 2005, at 7:57 AM, Amit Indap wrote: > >> Hi, >> >> I am new to using the biosql. I am trying to load fasta formatted >> RefSeq records into the biosql schema. When I try to use the >> load_seqdatabase.pl script I get the following error >> >> load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql >> --namespace refseq --format fasta refseq.fa >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were >> ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknow >> n >> ","PREDICTED: >> Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,) >> Duplicate entry 'unknown-1-0' for key 2 >> --------------------------------------------------- >> Could not store unknown: >> ------------- EXCEPTION ------------- >> MSG: You're trying to lie about the length: is 1316 but you say 6474 >> STACK Bio::PrimarySeq::length >> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418 >> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: >> 553 >> STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612 >> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: >> 553 >> STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1310 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:976 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:855 >> STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284 >> STACK Bio::DB::BioSQL::SeqAdaptor::attach_children >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:1341 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:976 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:855 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:205 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ >> BasePersistenceAdaptor.pm:254 >> STACK Bio::DB::Persistent::PersistentObject::store >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm: >> 272 >> STACK (eval) ./load_seqdatabase.pl:542 >> STACK toplevel ./load_seqdatabase.pl:525 >> >> -------------------------------------- >> at ./load_seqdatabase.pl line 555 >> >> I think my fasta headers are incorrect since it says it cannot store >> unknown. The first fasta record in my refseq.fa is this: >> >>> gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin >> domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E), >> mRNA >> >> Do I need to reformat that header? I downloaded the NM series of >> Refseqs in fasta form from NCBI's ftp site and wanted to load them >> into the biosql schema. >> >> Thanks, >> >> Amit Indap >> Dept. of Biological Statistics and Computational Biology >> Cornell University >> >> >> (error message) >> Loading refseq.fa ... >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Tue Aug 23 16:03:56 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Aug 23 15:53:23 2005 Subject: [Bioperl-l] Windows bug in Bio::DB::Fasta? In-Reply-To: <200508221818.08032.lstein@cshl.edu> References: <1124116511.2891.9.camel@localhost.localdomain> <1124126549.2868.2.camel@localhost.localdomain> <200508221818.08032.lstein@cshl.edu> Message-ID: <6.2.1.2.2.20050823150328.04bba998@express.cites.uiuc.edu> That did the trick! Everything looks fine now. Thanks Lincoln! Chris At 05:18 PM 8/22/2005, Lincoln Stein wrote: >I've just looked into this. The bug occurs when Windows opens the FASTA file >in text mode rather than binary mode; when in text mode the "\r\n" sequence >is invisibly mapped to "\n" during readline operations, so Bio::DB::Fasta >thinks that it is dealing with a Unix-format file; then when the module tries >to seek() to the proper line number, Windows doesn't do the line end mapping, >so it seeks to the wrong offset. (sound of hairs being pulled) > >I've fixed the problem by explicitly calling binmode() on all filehandles >that >Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS and >the gbrowse 1.63 CVS version. It ought to fix Chris' GC content weirdness. > >Lincoln > >On Monday 15 August 2005 01:22 pm, Scott Cain wrote: > > Just to follow up on my own email with a little more information: in > > Fasta.pm, line 697: > > > > $termination_length ||= /\r\n$/ ? 2 : 1; # account for crlf-terminated > > Windows files > > > > The pattern match is failing on DOS formatted files; I don't know why. > > Does anyone else? > > > > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote: > > > Hello all, > > > > > > I am investigating a bug in GBrowse that seems to only surface when > > > people are using the memory (ie, file) adaptor on Windows systems. > > > Here's the bug report: > > > > > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&grou > > >p_id=27707 > > > > > > I've tracked the problem down to Bio::DB::Fasta when the file is dos > > > formatted (that is, it has both line feeds and carriage returns), BDF > > > returns the wrong string when a subsequence is requested, but when the > > > file is unix formatted (ie only CR (or is it only LF?)), it returns the > > > right string. I wrote the very simple test script below and stepped it > > > through the perl debugger. It looks like the bug is in the caloffset > > > method, as it returns the same offsets regardless of the file type, > > > which then makes the subsequent seek into the file go to the wrong > > > coordinates of dos formatted files. > > > > > > Unfortunately, I don't really know what is going on caloffset, so I > > > don't know how to fix it, but it presumably has to check the format of > > > the file somewhere and take that into account. > > > > > > Thanks, > > > Scott > >-- >Lincoln D. Stein >Cold Spring Harbor Laboratory >1 Bungtown Road >Cold Spring Harbor, NY 11724 >FOR URGENT MESSAGES & SCHEDULING, >PLEASE CONTACT MY ASSISTANT, >SANDRA MICHELSEN, AT michelse@cshl.edu Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From thechans at citiz.net Wed Aug 24 05:48:19 2005 From: thechans at citiz.net (thechans@citiz.net) Date: Wed Aug 24 06:06:50 2005 Subject: [Bioperl-l] Strange result Message-ID: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S> Hello, I am new to Bioperl. I want to just copy some sequences out of an existing genbank file(multiple seq included). The selected sequences is still gb format and put to a new file. I received no warning and it worked well but the new file showed something strange. e.g, /country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)" /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)" /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)" /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)" /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)" why it happened? From barry.moore at genetics.utah.edu Wed Aug 24 08:30:00 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed Aug 24 08:21:48 2005 Subject: [Bioperl-l] Re: Thanks In-Reply-To: <1072d891078847.10788471072d89@emich.edu> References: <1072d891078847.10788471072d89@emich.edu> Message-ID: <430C6848.4010905@genetics.utah.edu> Usha- It is important that you keep Bioperl related discussions on the bioperl list, that way others can benefit from the discussion in the future by searching the archives. Having said that, I am constantly accidentally replying directly to people and not to the list because I hit reply instead of reply all, so I'm not really a good one to talk. It seems from nature of your questions to this list that you might be quite new to perl programming. You participation on the list is still welcome, but this is the kind of problem that you want to learn how to solve yourself. You want to to avoid at all cost giving the impression that you are asking the list to do your debugging for you - otherwise people will just stop replying to your messages. An very valuable article about asking questions to forums like this one is found at http://www.catb.org/~esr/faqs/smart-questions.html#answers . Read it. Live it. O.K. enough preaching... If you have Programming Perl read the first half of chapter 20. If you don't have the Perl book, you can get the same info from http://www.perldoc.com/perl5.8.4/pod/perldebug.html. After you understand the perl debugger (or if you already do) look at the error message that you got. In the first line it reports "seq doesn't validate". So something is wrong with some sequence that you are trying to use in your script. Since perl itself isn't' aware of sequences, and since the stack trace below shows that the exception occurred while in Bio::PrimarySeq you can conclude that some sequence that your are sending to bioperl is bad. Now fire up perl running your script with the debugger like this perl -d yourscript.pl. Use 'n' to step through your code to the open command on line 14. No errors yet? You've just loaded your sequences.fasta sequence, so that sequence must be OK. Continue stepping through your code until you get the error again. Where exactly did the error occur? When you try to set $input2 as a new Bio::Seq object? Run the debugger again, and step through to the line just before where the error occurred. Use the debugger's x command to see what the values of $seq_id, $probe_id, $position, $probe_sequence are. This should give you a clue s to what your problem is. One more clue comes from the error message. It says "Attempting to set the sequence to [PROBE_SEQUENCE] which does not look healthy". The error message says that you are trying to set the sequence to PROBE_SEQUENCE. Try to figure out why this error is occurring and how to solve it. If you're still stuck let us know what you've tried and ask us again. Barry Usha Rani Reddi wrote: >Hi, >Thanks a lot for your help. When I tried to run the given code I got the >following message. > >MSG: seq doesn't validate, mismatch is 1 >--------------------------------------------------- > >------------- EXCEPTION ------------- >MSG: Attempting to set the sequence to [PROBE_SEQUENCE] which does not >look healthy >STACK Bio::PrimarySeq::seq >/usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:268 >STACK Bio::PrimarySeq::new >/usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:217 >STACK Bio::Seq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:498 >STACK toplevel barr:23 > >What should I do next? Please help me. >Thanks >Usha. > >----- Original Message ----- >From: Barry Moore >Date: Tuesday, August 23, 2005 2:40 pm >Subject: [Bioperl-l] RE: Local bl2seq > > > >>Usha- >> >>I think the code below will wrap your existing code in the loop you >>need. You will want to get a copy of a good perl programming book >>likeProgramming Perl from O'Reilly. It will help you out with all >>thoselittle perl details like loop structures etc. >> >>Barry >> >>#!/usr/bin/perl >> >>use strict; >>use warnings; >>use Bio::SeqIO; >>use Bio::Tools::Run::StandAloneBlast; >>use Bio::Seq; >> >>my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta", >> -format => "fasta" ); >> >>my $seq_obj = $seqio_obj->next_seq; >> >>open (IN, " location/of/your/probe/file") or die "Can't open IN"; >> >>while (my $row = ) { >> chomp $row; >> #Assuming your file is tab delimited... >> my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/, >>$row; >> >> my $input2 = Bio::Seq->new(-id=>"testquery2", >> -seq=> $probe_sequence >> ); >> >> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>'blastn', >> 'outfile' => >>'bl2seq1.out'); >> >> my $blast_report = $factory->bl2seq($seq_obj, $input2); >> >> #Here is where you want to throw out good matches. You'll need to >>determine >> #what method you want to do that. Maybe since you want there >>to be >>no good >> #hits you would just call $blast_report->max_significance and make >>sure it's >> #value is too high to be significant. >> if ($blast_report->max_significance > 0.01) { >> print "$row\n"; >> } >>} >> >>-----Original Message----- >>From: Usha Rani Reddi [mailto:ureddi@emich.edu] >>Sent: Tuesday, August 23, 2005 5:51 AM >>To: Barry Moore >>Cc: bioperl-l@portal.open-bio.org >>Subject: Local bl2seq >> >>Hi, >>I am trying to use BLAST to compare the sequences. I did the >>program in >>Bioperl. Below is my piece of code >>use Bio::SeqIO; >>use Bio::Tools::Run::StandAloneBlast; >>use Bio::Seq; >>$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta", >> -format => "fasta" ); >>$seq_obj = $seqio_obj->next_seq; >>$input2 = Bio::Seq->new(-id=>"testquery2", >> >>-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca"); >> >>$factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>'blastn','outfile' => 'bl2seq1.out'); >>$blast_report = $factory->bl2seq($seq_obj, $input2); >> >>I need help for looping input2. I want to extract this part of >>sequencefrom a file containing 200000 records. Using perl I am >>extracting the >>sequence part for file of format given below. >>SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE >>NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC >>NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT >>NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG >> >>code for extracting PROBE_SEQUENCE looks like this >> >>$NemSeq =; >> >>chomp $NemSeq; >> >>unless (open(seqfile, $NemSeq)) { >>print "Cannot open file \n"; >>exit; >>} >>@NemSeq = ; >> >>close seqfile; >> >>for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) { >> #print $k, $NemSeq[$k]; >> @Nem =split(/\t/,$NemSeq[$k]); >> $input= $Nem[3]; >> >> #print scalar(@Nem); >> #print $Nem[3], "\n"; >> >>} >> >> >>@Nem =split(/\t/,$NemSeq) >> >>$input2 = substr(@NemSeq,4,32); >> >>So far I could successfully use bioperl(bl2seq) to compare whole >>genomewith single probe. >>I want to compare all the 200000 thousand probes. I am interested only >>in mismatches, for this particular scenario my assumption is that more >>than 90% of them will match. I want to send only the mismatches to >>output file and discard the matches. I would like to classify the >>mismatches based on the percentage dissimilarity, is there a way in >>Bioperl for this? Thanks a lot for the reply. Please help me with >>this.Thanks >>Usha >> >> >>----- Original Message ----- >>From: Barry Moore >>Date: Monday, August 22, 2005 11:45 pm >>Subject: Re: [Bioperl-l] bl2seq >> >> >> >>>Usha, >>> >>>The best advice I can give you is that you need to focus your >>>question a >>>bit more. What method are you using to compare your probe to >>> >>> >>your >> >> >>>fasta? Regex, BLAST, Needle, RNAHybrid...? You say your >>> >>> >>sequence >> >> >>>is >>>working fine for single sequence. Are you using Bioperl for >>> >>> >>that? >> >> >>>Can >>>you tell us exactly what isn't working for you or what questions >>>you >>>have about working with multiple sequences? Are you already >>> >>> >>using >> >> >>>Bioperl with your single sequence comparison? Can you show us >>> >>> >>some >> >> >>>code? >>>Barry >>> >>>Usha Rani Reddi wrote: >>> >>> >>> >>>>Hi, >>>>I am trying to compare two hundred thousand probes(each one of >>>> >>>> >>>them) to >>> >>> >>>>another genome. Format of the file containing probes is like this >>>>SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE >>>>NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC >>>>NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT >>>>NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG >>>>NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC >>>>NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA >>>>NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG >>>>NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG >>>>NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA >>>>What I am trying to do is compare PROBE_SEQUENCE to fasta file of >>>>Streptococcus agalactiae. I am trying to loop through the probes >>>> >>>> >>>but not >>> >>> >>>>sure how to proceed. My program is working fine for single >>>> >>>> >>>sequence. One >>> >>> >>>>more thing is I am not interested in matches, I want to display >>>> >>>> >>only> >mismatches. I am new to Bioperl, some one please help me >>with this. >> >> >>>>Thanks >>>>Usha >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>-- >>>Barry Moore >>>Dept. of Human Genetics >>>University of Utah >>>Salt Lake City, UT >>> >>> >>> >>> >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From barry.moore at genetics.utah.edu Wed Aug 24 08:47:10 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed Aug 24 08:36:38 2005 Subject: [Bioperl-l] Strange result In-Reply-To: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S> References: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S> Message-ID: <430C6C4E.2090504@genetics.utah.edu> Show us the code that produced this result. Barry thechans@citiz.net wrote: >Hello, >I am new to Bioperl. I want to just copy some sequences out of an existing genbank file(multiple seq included). >The selected sequences is still gb format and put to a new file. >I received no warning and it worked well but the new file showed something strange. >e.g, > /country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)" > /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)" > /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)" > /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)" > /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)" >why it happened? > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From birney at ebi.ac.uk Wed Aug 24 08:55:11 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Wed Aug 24 08:45:02 2005 Subject: [Bioperl-l] Strange result In-Reply-To: <430C6C4E.2090504@genetics.utah.edu> References: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S> <430C6C4E.2090504@genetics.utah.edu> Message-ID: <430C6E2F.7000302@ebi.ac.uk> Barry Moore wrote: > Show us the code that produced this result. > > Barry > > thechans@citiz.net wrote: > >> Hello, >> I am new to Bioperl. I want to just copy some sequences out of an >> existing genbank file(multiple seq included). >> The selected sequences is still gb format and put to a new file. >> I received no warning and it worked well but the new file showed >> something strange. >> e.g, >> >> /country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)" >> >> /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)" >> >> /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)" >> >> /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)" >> >> /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)" >> why it happened? >> And the version. This is a 1.5x bug due to the skew that happened in the ontology/embl/genbank thingy. I think if you went back to an earlier version of bioperl you'd be fine. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > From agathman at semo.edu Wed Aug 24 09:44:47 2005 From: agathman at semo.edu (Allen Gathman) Date: Wed Aug 24 09:34:50 2005 Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq In-Reply-To: <33580922CBEEC846B473BAE124985DE0051ADE4B@xchgnt.semo.edu> Message-ID: <33580922CBEEC846B473BAE124985DE0030BD156@xchgnt.semo.edu> Well, I appear to have fixed it myself, although in a kind of inelegant way -- I pulled the display_id of the predicted gene I wanted, then used it in a get_feature_by_name call to pull the feature out again. @new_genes=$db->get_feature_by_name( Sequence => $gid); $ngene = shift (@new_genes); etc. That "re-captured" feature ("$ngene" above) splices correctly when I use spliced_seq on it. I'm still a bit puzzled why the original code doesn't get me the whole gene. Allen Gathman http://cstl-csm.semo.edu/gathman > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Gathman, Allen > Sent: Tuesday, August 23, 2005 1:39 PM > To: 'bioperl-l@portal.open-bio.org' > Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq > > Hi, BioPerl gurus: > > > > Although this question involves a Gbrowse database, I think it's actually > a > BioPerl question at heart - and in any case it appears that there's a lot > of > overlap between the people who answer questions in both lists, so I'm > guessing this is a good place for this question. > > > > I've written a script that finds particular pfam hits in a GBROWSE > database, > then uses "overlapping_features" to find predicted gene features of type > "transcript:GLEAN" that overlap those pfams. I've set the aggregator > "transcript" as {CDS/mRNA} already. I select features using a regular > expression to choose particular names, then I use spliced_seq to return > the > spliced CDS of each feature - but I'm only getting back the CDS that > actually overlap the pfam hit, not the full predicted gene. So my > question > is, what do I need to do in order to get ALL the CDS of each predicted > gene > feature spliced together, instead of only the ones that actually overlap > the > pfam hit I used to select that predicted gene? > > > > Thanks in advance for any help you can give... > > > > Here's the code: > > > > #!/usr/bin/perl > > > > use strict; > > use Bio::DB::GFF; > > use Bio::Seq; > > use Bio::SeqIO; > > use Getopt::Long; > > > > my $outfile; > > GetOptions( > > 'o|outfile=s' => \$outfile, > > ); > > > > my $outfa= Bio::SeqIO -> new (-file => ">$outfile", > > -format => 'Fasta' > > ); > > > > my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql', > > -dsn => > 'dbi:mysql:database=cc;host=localhost', > > -fasta => '/gbrowse/databases/cc' > > ); > > > > $db->add_aggregator('transcript{CDS/mRNA}'); > > > > for (my $i =1; $i<=20; $i++){ > > my $pfamname="Peptidase_C$i"; > > my @pfams = $db->get_feature_by_name( Domain => $pfamname); > > foreach my $pfamhit (@pfams){ > > my $desc = $pfamname; > > my $score=$pfamhit->score; > > my $name = $pfamhit->name; > > $desc.= " $score "; > > $desc.= $pfamhit->location->seq_id(); > > $desc.= ": "; > > # > > # Here's where I'm selecting predicted genes that overlap the Pfam hit > > # > > my @genes = $db -> overlapping_features( > > -refseq => $pfamhit->location->seq_id, > > -start => $pfamhit->start, > > -stop => $pfamhit->stop, > > -types =>'transcript:GLEAN' > > ); > > # > > # Now I'm choosing the ones with names I want out of the selected genes > > # > > foreach my $gene (@genes){ > > my $gid=$gene->display_id(); > > if ($gid =~/aug_GLEAN/){ > > $desc.=$gene->start; > > $desc.=" - "; > > $desc.=$gene->stop; > > # > > # Here I'm splicing the gene, tacking on a description, and outputting it. > > # > > > > my $splseq = $gene->spliced_seq(); > > $splseq->desc($desc); > > $splseq->display_id($gid); > > $outfa->write_seq($splseq); > > > > }# end if aug_GLEAN > > }# end foreach gene > > }# end foreach pfamhit > > } # end for numbers > > close OUT; > > > > > > Allen Gathman > > http://cstl-csm.semo.edu/gathman > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Aug 24 12:16:02 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Aug 24 12:05:55 2005 Subject: [Bioperl-l] Strange result In-Reply-To: <430C6E2F.7000302@ebi.ac.uk> References: <1124876899286.1765.app2.Naesasoft.Q4QUAW7S> <430C6C4E.2090504@genetics.utah.edu> <430C6E2F.7000302@ebi.ac.uk> Message-ID: <2d060be7d860518ef3b7aca0dffebfb8@gmx.net> Right. Either downgrade to 1.4 or upgrade to a CVS snapshot of the main trunk. -hilmar On Aug 24, 2005, at 5:55 AM, Ewan Birney wrote: > > > Barry Moore wrote: >> Show us the code that produced this result. >> Barry >> thechans@citiz.net wrote: >>> Hello, >>> I am new to Bioperl. I want to just copy some sequences out of an >>> existing genbank file(multiple seq included). >>> The selected sequences is still gb format and put to a new file. >>> I received no warning and it worked well but the new file showed >>> something strange. >>> e.g, >>> >>> /country="Bio::Annotation::SimpleValue=HASH(0x1bdc7a4)" >>> >>> /db_xref="Bio::Annotation::SimpleValue=HASH(0x1bdc9cc)" >>> >>> /mol_type="Bio::Annotation::SimpleValue=HASH(0x1bdca68)" >>> >>> /isolate="Bio::Annotation::SimpleValue=HASH(0x1bdc8ac)" >>> >>> /organism="Bio::Annotation::SimpleValue=HASH(0x1bdc954)" >>> why it happened? >>> > > And the version. This is a 1.5x bug due to the skew that > happened in the ontology/embl/genbank thingy. I think if you > went back to an earlier version of bioperl you'd be fine. > > > >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Aug 24 12:39:21 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Aug 24 12:29:49 2005 Subject: [Bioperl-l] Re: Thanks In-Reply-To: <430C6848.4010905@genetics.utah.edu> References: <1072d891078847.10788471072d89@emich.edu> <430C6848.4010905@genetics.utah.edu> Message-ID: <013d943156f0fc995505c405201b11fd@gmx.net> Thanks Barry for this excellent answer. It couldn't have been written better. -hilmar On Aug 24, 2005, at 5:30 AM, Barry Moore wrote: > Usha- > > It is important that you keep Bioperl related discussions on the > bioperl list, that way others can benefit from the discussion in the > future by searching the archives. Having said that, I am constantly > accidentally replying directly to people and not to the list because I > hit reply instead of reply all, so I'm not really a good one to talk. > > It seems from nature of your questions to this list that you might be > quite new to perl programming. You participation on the list is still > welcome, but this is the kind of problem that you want to learn how to > solve yourself. You want to to avoid at all cost giving the > impression that you are asking the list to do your debugging for you > - otherwise people will just stop replying to your messages. An very > valuable article about asking questions to forums like this one is > found at http://www.catb.org/~esr/faqs/smart-questions.html#answers > . Read > it. Live it. O.K. enough preaching... > > If you have Programming Perl > qid=1124883861/sr=8-1/ref=pd_bbs_1/102-6339742-6061723? > v=glance&s=books&n=507846> read the first half of chapter 20. If you > don't have the Perl book, you can get the same info from > http://www.perldoc.com/perl5.8.4/pod/perldebug.html. After you > understand the perl debugger (or if you already do) look at the error > message that you got. In the first line it reports "seq doesn't > validate". So something is wrong with some sequence that you are > trying to use in your script. Since perl itself isn't' aware of > sequences, and since the stack trace below shows that the exception > occurred while in Bio::PrimarySeq you can conclude that some sequence > that your are sending to bioperl is bad. Now fire up perl running > your script with the debugger like this perl -d yourscript.pl. Use > 'n' to step through your code to the open command on line 14. No > errors yet? You've just loaded your sequences.fasta sequence, so that > sequence must be OK. Continue stepping through your code until you > get the error again. Where exactly did the error occur? When you try > to set $input2 as a new Bio::Seq object? Run the debugger again, and > step through to the line just before where the error occurred. Use > the debugger's x command to see what the values of $seq_id, > $probe_id, $position, $probe_sequence are. This should give you a > clue s to what your problem is. One more clue comes from the error > message. It says "Attempting to set the sequence to [PROBE_SEQUENCE] > which does not look healthy". The error message says that you are > trying to set the sequence to PROBE_SEQUENCE. Try to figure out why > this error is occurring and how to solve it. If you're still stuck > let us know what you've tried and ask us again. > Barry > > > Usha Rani Reddi wrote: > >> Hi, >> Thanks a lot for your help. When I tried to run the given code I got >> the >> following message. >> >> MSG: seq doesn't validate, mismatch is 1 >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Attempting to set the sequence to [PROBE_SEQUENCE] which does not >> look healthy >> STACK Bio::PrimarySeq::seq >> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:268 >> STACK Bio::PrimarySeq::new >> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:217 >> STACK Bio::Seq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:498 >> STACK toplevel barr:23 >> >> What should I do next? Please help me. >> Thanks >> Usha. >> >> ----- Original Message ----- >> From: Barry Moore >> Date: Tuesday, August 23, 2005 2:40 pm >> Subject: [Bioperl-l] RE: Local bl2seq >> >> >>> Usha- >>> >>> I think the code below will wrap your existing code in the loop you >>> need. You will want to get a copy of a good perl programming book >>> likeProgramming Perl from O'Reilly. It will help you out with all >>> thoselittle perl details like loop structures etc. >>> >>> Barry >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use Bio::SeqIO; >>> use Bio::Tools::Run::StandAloneBlast; >>> use Bio::Seq; >>> >>> my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta", >>> -format => "fasta" ); >>> >>> my $seq_obj = $seqio_obj->next_seq; >>> >>> open (IN, " location/of/your/probe/file") or die "Can't open IN"; >>> >>> while (my $row = ) { >>> chomp $row; >>> #Assuming your file is tab delimited... >>> my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/, >>> $row; >>> >>> my $input2 = Bio::Seq->new(-id=>"testquery2", >>> -seq=> $probe_sequence >>> ); >>> >>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>> 'blastn', >>> 'outfile' => >>> 'bl2seq1.out'); >>> >>> my $blast_report = $factory->bl2seq($seq_obj, $input2); >>> >>> #Here is where you want to throw out good matches. You'll need to >>> determine >>> #what method you want to do that. Maybe since you want there to be >>> no good >>> #hits you would just call $blast_report->max_significance and make >>> sure it's >>> #value is too high to be significant. >>> if ($blast_report->max_significance > 0.01) { >>> print "$row\n"; >>> } >>> } >>> >>> -----Original Message----- >>> From: Usha Rani Reddi [mailto:ureddi@emich.edu] Sent: Tuesday, >>> August 23, 2005 5:51 AM >>> To: Barry Moore >>> Cc: bioperl-l@portal.open-bio.org >>> Subject: Local bl2seq >>> >>> Hi, >>> I am trying to use BLAST to compare the sequences. I did the program >>> in >>> Bioperl. Below is my piece of code >>> use Bio::SeqIO; >>> use Bio::Tools::Run::StandAloneBlast; >>> use Bio::Seq; >>> $seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta", >>> -format => "fasta" ); >>> $seq_obj = $seqio_obj->next_seq; >>> $input2 = Bio::Seq->new(-id=>"testquery2", >>> >>> -seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca"); >>> $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>> 'blastn','outfile' => 'bl2seq1.out'); >>> $blast_report = $factory->bl2seq($seq_obj, $input2); >>> >>> I need help for looping input2. I want to extract this part of >>> sequencefrom a file containing 200000 records. Using perl I am >>> extracting the >>> sequence part for file of format given below. >>> SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE >>> NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC >>> NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT >>> NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG >>> >>> code for extracting PROBE_SEQUENCE looks like this >>> >>> $NemSeq =; >>> >>> chomp $NemSeq; >>> >>> unless (open(seqfile, $NemSeq)) { >>> print "Cannot open file \n"; >>> exit; >>> } >>> @NemSeq = ; >>> >>> close seqfile; >>> >>> for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) { >>> #print $k, $NemSeq[$k]; >>> @Nem =split(/\t/,$NemSeq[$k]); >>> $input= $Nem[3]; >>> >>> #print scalar(@Nem); >>> #print $Nem[3], "\n"; >>> } >>> >>> >>> @Nem =split(/\t/,$NemSeq) >>> >>> $input2 = substr(@NemSeq,4,32); >>> >>> So far I could successfully use bioperl(bl2seq) to compare whole >>> genomewith single probe. I want to compare all the 200000 thousand >>> probes. I am interested only >>> in mismatches, for this particular scenario my assumption is that >>> more >>> than 90% of them will match. I want to send only the mismatches to >>> output file and discard the matches. I would like to classify the >>> mismatches based on the percentage dissimilarity, is there a way in >>> Bioperl for this? Thanks a lot for the reply. Please help me with >>> this.Thanks >>> Usha >>> >>> >>> ----- Original Message ----- >>> From: Barry Moore >>> Date: Monday, August 22, 2005 11:45 pm >>> Subject: Re: [Bioperl-l] bl2seq >>> >>> >>>> Usha, >>>> >>>> The best advice I can give you is that you need to focus your >>>> question a bit more. What method are you using to compare your >>>> probe to >>> your >>>> fasta? Regex, BLAST, Needle, RNAHybrid...? You say your >>> sequence >>>> is working fine for single sequence. Are you using Bioperl for >>> that? >>>> Can you tell us exactly what isn't working for you or what >>>> questions you have about working with multiple sequences? Are you >>>> already >>> using >>>> Bioperl with your single sequence comparison? Can you show us >>> some >>>> code? >>>> Barry >>>> >>>> Usha Rani Reddi wrote: >>>> >>>> >>>>> Hi, >>>>> I am trying to compare two hundred thousand probes(each one of >>>> them) to >>>> >>>>> another genome. Format of the file containing probes is like this >>>>> SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE >>>>> NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC >>>>> NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT >>>>> NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG >>>>> NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC >>>>> NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA >>>>> NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG >>>>> NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG >>>>> NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA >>>>> What I am trying to do is compare PROBE_SEQUENCE to fasta file of >>>>> Streptococcus agalactiae. I am trying to loop through the probes >>>> but not >>>> >>>>> sure how to proceed. My program is working fine for single >>>> sequence. One >>>> >>>>> more thing is I am not interested in matches, I want to display >>> only> >mismatches. I am new to Bioperl, some one please help me with >>> this. >>> >>>>> Thanks >>>>> Usha >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> Barry Moore >>>> Dept. of Human Genetics >>>> University of Utah >>>> Salt Lake City, UT >>>> >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Wed Aug 24 13:11:29 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 24 13:01:14 2005 Subject: [Bioperl-l] 1.5.1 todo list Message-ID: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu> So after those exchanges on what version of Bioperl needs to be run, and the various floating bugs in the release soup.... If we had to make a list of showstoppers for 1.5.1 release what would they be? To explicitly state, I think the purpose of 1.5.1 release should a) clean up know showstopper bugs in 1.5.0 b) allow new modules/functionality to be introduced since 1.4 and 1.5.0 c) be preparing the way for 1.6 release by putting code out in the wilds. I am fairly adamant API changes that are not backwards compatible need to be CAREFULLY thought out before being allowed in. Since the code base is so big at this point, there need to be good tests in place to confirm this, and a responsibility from the developers to make sure this is the case. My hope is that Gbrowse (live) could be successfully run on a 1.5.1 as I feel that is largest 'external' consumer of Bioperl, with BioSQL and of course everyone's scripts which use a handful of modules. What is the status of bioperl code for: Ontology work BioSQL support (from the Core code at least, how much in sync would 1.5.1 be with biosql-perl release?) Bio::FeatureIO stuff + Bio::SeqFeature changes? Bio::DB::GFF work? the GFF3 schema would be way past 1.5.1, but is that something we'd want to shoot for in 1.6? Other things? Please report in. Times like this sort of make me want a Wiki so we can keep track but I'll at least volunteer to collate the results into a summary email. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From arne.nolte at uni-koeln.de Wed Aug 24 09:32:16 2005 From: arne.nolte at uni-koeln.de (Arne Nolte) Date: Wed Aug 24 16:21:13 2005 Subject: [Bioperl-l] maximum likelihood estimation Message-ID: <000501c5a8b0$3eac31f0$8db35f86@tautzarne> Dear all, I would like to calculate calculate maximum likelihood estimators given a likelihood function and some parameters. are there tools available to do this using perl? thanks, Arne Arne Nolte Institute for Genetics Evolutionary Genetics Z?lpicher Str. 47 50674 Cologne Germany Tel.: 0221/470-4034 Fax.: 0221/470-5975 From birney at ebi.ac.uk Thu Aug 25 04:59:23 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Aug 25 04:48:55 2005 Subject: [Bioperl-l] 1.5.1 todo list In-Reply-To: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu> References: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu> Message-ID: <430D886B.2050603@ebi.ac.uk> Jason Stajich wrote: > So after those exchanges on what version of Bioperl needs to be run, > and the various floating bugs in the release soup.... > > If we had to make a list of showstoppers for 1.5.1 release what would > they be? > > To explicitly state, I think the purpose of 1.5.1 release should > a) clean up know showstopper bugs in 1.5.0 > b) allow new modules/functionality to be introduced since 1.4 and 1.5.0 > c) be preparing the way for 1.6 release by putting code out in the wilds. > > I am fairly adamant API changes that are not backwards compatible need > to be CAREFULLY thought out before being allowed in. Since the code > base is so big at this point, there need to be good tests in place to > confirm this, and a responsibility from the developers to make sure > this is the case. > > > My hope is that Gbrowse (live) could be successfully run on a 1.5.1 as > I feel that is largest 'external' consumer of Bioperl, with BioSQL and > of course everyone's scripts which use a handful of modules. > > What is the status of bioperl code for: > Ontology work > BioSQL support (from the Core code at least, how much in sync would > 1.5.1 be with biosql-perl release?) > Bio::FeatureIO stuff + Bio::SeqFeature changes? I've wrriten my interface class for TypedSeqFeature but not done an implementation yet. I'll commit the interface and work on an implementation. > Bio::DB::GFF work? the GFF3 schema would be way past 1.5.1, but is > that something we'd want to shoot for in 1.6? > Other things? > > Please report in. Times like this sort of make me want a Wiki so we > can keep track but I'll at least volunteer to collate the results into > a summary email. > > > -jason > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From ed at compbio.berkeley.edu Thu Aug 25 05:41:37 2005 From: ed at compbio.berkeley.edu (Ed Green) Date: Thu Aug 25 05:31:12 2005 Subject: [Bioperl-l] maximum likelihood estimation In-Reply-To: <000501c5a8b0$3eac31f0$8db35f86@tautzarne> References: <000501c5a8b0$3eac31f0$8db35f86@tautzarne> Message-ID: <430D9251.1050303@compbio.berkeley.edu> Arne- I quickly searched CPAN for "maximum likelihood" and "MLE" and found nothing relevant. If you are also a C programmer, you may be interested in the GNU Scientific Library (GSL) http://www.gnu.org/software/gsl/ GSL has nicely written and documented code for multidimensional minimization that may be useful for you: http://www.gnu.org/software/gsl/manual/gsl-ref_35.html#SEC460 I have used these functions to solve a problem like you've described, so I could provide more information, (off this list since it wouldn't really be bioperl-y or even perl-y) if you're interested. Regards, Ed Green Max Planck Institute for Evolutionary Anthropology Deutscher Platz 6 04103 Leipzig Germany Arne Nolte wrote: >Dear all, > >I would like to calculate calculate maximum likelihood estimators given a >likelihood function and some parameters. > >are there tools available to do this using perl? > >thanks, > >Arne > > >Arne Nolte >Institute for Genetics >Evolutionary Genetics >Z?lpicher Str. 47 >50674 Cologne >Germany > >Tel.: 0221/470-4034 >Fax.: 0221/470-5975 > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From lstein at cshl.edu Wed Aug 24 17:30:16 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Aug 25 10:40:08 2005 Subject: [Gmod-gbrowse] Re: [Bioperl-l] Windows bug in Bio::DB::Fasta? In-Reply-To: <6.2.1.2.2.20050823150328.04bba998@express.cites.uiuc.edu> References: <1124116511.2891.9.camel@localhost.localdomain> <200508221818.08032.lstein@cshl.edu> <6.2.1.2.2.20050823150328.04bba998@express.cites.uiuc.edu> Message-ID: <200508241730.18892.lstein@cshl.edu> Glad it fixed the problem. Much thanks to Scott who correctly diagnosed the problem. Lincoln On Tuesday 23 August 2005 04:03 pm, Chris Fields wrote: > That did the trick! Everything looks fine now. Thanks Lincoln! > > Chris > > At 05:18 PM 8/22/2005, Lincoln Stein wrote: > >I've just looked into this. The bug occurs when Windows opens the FASTA > > file in text mode rather than binary mode; when in text mode the "\r\n" > > sequence is invisibly mapped to "\n" during readline operations, so > > Bio::DB::Fasta thinks that it is dealing with a Unix-format file; then > > when the module tries to seek() to the proper line number, Windows > > doesn't do the line end mapping, so it seeks to the wrong offset. (sound > > of hairs being pulled) > > > >I've fixed the problem by explicitly calling binmode() on all filehandles > >that > >Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS > > and the gbrowse 1.63 CVS version. It ought to fix Chris' GC content > > weirdness. > > > >Lincoln > > > >On Monday 15 August 2005 01:22 pm, Scott Cain wrote: > > > Just to follow up on my own email with a little more information: in > > > Fasta.pm, line 697: > > > > > > $termination_length ||= /\r\n$/ ? 2 : 1; # account for > > > crlf-terminated Windows files > > > > > > The pattern match is failing on DOS formatted files; I don't know why. > > > Does anyone else? > > > > > > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote: > > > > Hello all, > > > > > > > > I am investigating a bug in GBrowse that seems to only surface when > > > > people are using the memory (ie, file) adaptor on Windows systems. > > > > Here's the bug report: > > > > > > > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169& > > > >grou p_id=27707 > > > > > > > > I've tracked the problem down to Bio::DB::Fasta when the file is dos > > > > formatted (that is, it has both line feeds and carriage returns), BDF > > > > returns the wrong string when a subsequence is requested, but when > > > > the file is unix formatted (ie only CR (or is it only LF?)), it > > > > returns the right string. I wrote the very simple test script below > > > > and stepped it through the perl debugger. It looks like the bug is > > > > in the caloffset method, as it returns the same offsets regardless of > > > > the file type, which then makes the subsequent seek into the file go > > > > to the wrong coordinates of dos formatted files. > > > > > > > > Unfortunately, I don't really know what is going on caloffset, so I > > > > don't know how to fix it, but it presumably has to check the format > > > > of the file somewhere and take that into account. > > > > > > > > Thanks, > > > > Scott > > > >-- > >Lincoln D. Stein > >Cold Spring Harbor Laboratory > >1 Bungtown Road > >Cold Spring Harbor, NY 11724 > >FOR URGENT MESSAGES & SCHEDULING, > >PLEASE CONTACT MY ASSISTANT, > >SANDRA MICHELSEN, AT michelse@cshl.edu > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From lstein at cshl.edu Thu Aug 25 11:24:12 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Aug 25 11:13:51 2005 Subject: [Bioperl-l] 1.5.1 todo list In-Reply-To: <430D886B.2050603@ebi.ac.uk> References: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu> <430D886B.2050603@ebi.ac.uk> Message-ID: <200508251124.13683.lstein@cshl.edu> > > Bio::DB::GFF work? the GFF3 schema would be way past 1.5.1, but is > > that something we'd want to shoot for in 1.6? > > Other things? I think Bio::DB::GFF3 will going in in the November/December time frame - probably December. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From birney at ebi.ac.uk Thu Aug 25 11:56:53 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Aug 25 11:46:15 2005 Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail Message-ID: <430DEA45.1060308@ebi.ac.uk> TypedSeqFeatureI is in. Implementation coming. Some tests are failing for me (not related to sequence features) Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/Index.t 79 20224 47 0 0.00% ?? t/PAML.t 166 14 8.43% 153-166 t/RestrictionIO.t 14 1 7.14% 10 t/SearchIO.t 1227 4 0.33% 1224-1227 145 subtests skipped. Failed 4/201 test scripts, 98.01% okay. 19/9625 subtests failed, 99.80% okay. I'll dig around on these, but can't promise to sort them out. From jason.stajich at duke.edu Thu Aug 25 12:16:34 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Aug 25 12:06:10 2005 Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail In-Reply-To: <430DEA45.1060308@ebi.ac.uk> References: <430DEA45.1060308@ebi.ac.uk> Message-ID: <372DA67D-B453-429E-B200-16E33805A141@duke.edu> They all pass for me on OSX, and linux. What version of perl? Do you have IO::String installed? I believe the last tests in PAML are parsing trees and assigning rates and parameters to branches. more details there if you can't track it down. I found that SearchIO was just a count getting set wrong for test count in when necessary XML modules are not installed. fixed that. -jason On Aug 25, 2005, at 11:56 AM, Ewan Birney wrote: > > TypedSeqFeatureI is in. Implementation coming. > > > Some tests are failing for me (not related to sequence > features) > > > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > --------- > t/Index.t 79 20224 47 0 0.00% ?? > t/PAML.t 166 14 8.43% 153-166 > t/RestrictionIO.t 14 1 7.14% 10 > t/SearchIO.t 1227 4 0.33% 1224-1227 > 145 subtests skipped. > Failed 4/201 test scripts, 98.01% okay. 19/9625 subtests failed, > 99.80% okay. > > > I'll dig around on these, but can't promise to sort them out. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From kynn at panix.com Thu Aug 25 12:18:59 2005 From: kynn at panix.com (kynn@panix.com) Date: Thu Aug 25 12:08:23 2005 Subject: [Bioperl-l] [OT] General bioinformatics forums/lists? Message-ID: <200508251618.j7PGIxZ09385@panix3.panix.com> I have many questions that are about bioinformatics in general, not BioPerl. Is there a good bioinformatics list where I could post them? I've Googled for one, but the lists I've found are specialized (e.g. focusing on specific software), or don't seem to get much traffic at all (or both). Thanks! kj From bmoore at genetics.utah.edu Thu Aug 25 12:35:14 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu Aug 25 12:22:49 2005 Subject: [Bioperl-l] DocBook Question Message-ID: Brian- I've been working on setting up a DocBook working environment so I can take a stab at writing a bioperl HOWTO. I'm using the e-novative environment (http://www.e-novative.info/index.php) for transforming the xml to pdf|html that the bioperl docs link to as the stylesheet source for those documents. My pdf output is all aligned against the left edge of the document with no left margin. Did you modify the e-novative stylesheet to correct this. I can't seem to fix that. Also, what do you use to create the plain old text files. I see that you are using RenderX for transformation. Is that better than the e-novative tools? I'm new to all this xml/xsl/css etc. etc. any suggestions for how to work best with the processing chain would be greatly appreciated. Barry Barry Moore Department of Human Genetics University of Utah Salt Lake City, UT 84112 From birney at ebi.ac.uk Thu Aug 25 12:34:45 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Aug 25 12:24:27 2005 Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail In-Reply-To: <372DA67D-B453-429E-B200-16E33805A141@duke.edu> References: <430DEA45.1060308@ebi.ac.uk> <372DA67D-B453-429E-B200-16E33805A141@duke.edu> Message-ID: <430DF325.3020306@ebi.ac.uk> Jason Stajich wrote: > They all pass for me on OSX, and linux. What version of perl? > [Ewan-Birneys-Computer:wise2/src/network] birney% perl -v This is perl, v5.6.0 built for darwin Copyright 1987-2000, Larry Wall > Do you have IO::String installed? I believe the last tests in PAML are > parsing trees and assigning rates and parameters to branches. > more details there if you can't track it down. > I have got IO::String installed. I'll dig. > I found that SearchIO was just a count getting set wrong for test count > in when necessary XML modules are not installed. fixed that. > Great. > -jason > On Aug 25, 2005, at 11:56 AM, Ewan Birney wrote: > >> >> TypedSeqFeatureI is in. Implementation coming. >> >> >> Some tests are failing for me (not related to sequence >> features) >> >> >> Failed Test Stat Wstat Total Fail Failed List of Failed >> ------------------------------------------------------------------------------- >> t/Index.t 79 20224 47 0 0.00% ?? >> t/PAML.t 166 14 8.43% 153-166 >> t/RestrictionIO.t 14 1 7.14% 10 >> t/SearchIO.t 1227 4 0.33% 1224-1227 >> 145 subtests skipped. >> Failed 4/201 test scripts, 98.01% okay. 19/9625 subtests failed, >> 99.80% okay. >> >> >> I'll dig around on these, but can't promise to sort them out. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > From hlapp at gmx.net Thu Aug 25 12:49:37 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Aug 25 12:39:19 2005 Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail In-Reply-To: <430DF325.3020306@ebi.ac.uk> References: <430DEA45.1060308@ebi.ac.uk> <372DA67D-B453-429E-B200-16E33805A141@duke.edu> <430DF325.3020306@ebi.ac.uk> Message-ID: <590c379d723a9438a5c760e0048e85a1@gmx.net> On Aug 25, 2005, at 9:34 AM, Ewan Birney wrote: > Jason Stajich wrote: >> They all pass for me on OSX, and linux. What version of perl? > > [Ewan-Birneys-Computer:wise2/src/network] birney% perl -v > > This is perl, v5.6.0 built for darwin > > Copyright 1987-2000, Larry Wall > I do suggest you upgrade perl. I know 5.6.0 is the one that comes with Jaguar, but it has bugs in some features bioperl is taking advantage of (nested regex in FTLocationFactory being just one example). I've had so much trouble to get Bioperl pass all tests with errors nobody else was getting that I finally gave up and upgraded (to Panther actually, but upgrading perl supposedly suffices). Once I did that most failures went away (and some new ones came up but that's another story and they are fixed meanwhile). I brought this up a while ago in spring and I can dig up the list thread if you're interested. The conclusion was that essentially we'll have to require perl 5.6.1 with the next release. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Thu Aug 25 13:00:34 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Aug 25 12:50:04 2005 Subject: [Bioperl-l] Bio::DB::GFF start/end coordinates Message-ID: Lincoln - One bug I'm still seeing in Bio::DB::GFF::Feature objects is start/ end are still returning start > end when strand < 0. I know this is different expectation for Bioperl / Gbrowse but this causes a little problems, especially when you get an aggregated feature out from Bio::DB:GFF and then write it to a genbank file. The locations looks like this: complement(join(1031..975,676..501)) My workaround is just to create new Location objects and features from the Bio::DB::GFF obtained objects (some of these aren't allowing write-back to overwrite the values). Note on a slightly separate topic: I have patched my Bio::Location::Split to_FTstring to simplify the string, current behavior would be to output the location like this: join(complement(1031..975),complement(676..501),)) I'm seeing about how applying the patch, I'm not sure whether or not it perfectly works. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From sm_middha at yahoo.com Thu Aug 25 13:27:05 2005 From: sm_middha at yahoo.com (sumit middha) Date: Thu Aug 25 13:17:45 2005 Subject: [Bioperl-l] [OT] General bioinformatics forums/lists? In-Reply-To: <200508251618.j7PGIxZ09385@panix3.panix.com> Message-ID: <20050825172706.46755.qmail@web30709.mail.mud.yahoo.com> Even I want info. on a good bioinfo discussion forum, where people discuss their doubts about some software, tool .. or their research question and figuring out best way to plan things, etc ... Thanks. --- kynn@panix.com wrote: > > > > I have many questions that are about bioinformatics > in general, not > BioPerl. Is there a good bioinformatics list where > I could post them? > I've Googled for one, but the lists I've found are > specialized > (e.g. focusing on specific software), or don't seem > to get much > traffic at all (or both). > > Thanks! > > kj > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs From MAG at Stowers-Institute.org Thu Aug 25 13:45:34 2005 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Thu Aug 25 13:35:06 2005 Subject: [Bioperl-l] [OT] General bioinformatics forums/lists? Message-ID: <200508251734.j7PHYvTv002473@portal.open-bio.org> How about https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ? Or https://bioinformatics.org/mailman/listinfo/ssml-general .. Maybe you cuold look at their archives to see if the topics discussed here suit your purpose. -Manisha Post-doc Associate, Stowers Institute for Medical Research Kansas city, MO -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of sumit middha Sent: Thursday, August 25, 2005 12:27 PM To: kynn@panix.com; bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] [OT] General bioinformatics forums/lists? Even I want info. on a good bioinfo discussion forum, where people discuss their doubts about some software, tool .. or their research question and figuring out best way to plan things, etc ... Thanks. --- kynn@panix.com wrote: > > > > I have many questions that are about bioinformatics > in general, not > BioPerl. Is there a good bioinformatics list where > I could post them? > I've Googled for one, but the lists I've found are specialized > (e.g. focusing on specific software), or don't seem > to get much > traffic at all (or both). > > Thanks! > > kj > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From astew at wam.umd.edu Thu Aug 25 17:08:32 2005 From: astew at wam.umd.edu (Andrew Stewart) Date: Thu Aug 25 16:58:00 2005 Subject: [Bioperl-l] ->add_tag_value() Message-ID: <430E3350.50604@wam.umd.edu> I'm trying to create genbank files from a sequence file and features retreived from glimmer output. When creating the new features and writing them to the genbank (richseq) object, though, I get the following output (for example)... CDS 21..596 /translation="Bio::Annotation::SimpleValue=HASH(0x987b78)" CDS complement(1713..2903) /translation="Bio::Annotation::SimpleValue=HASH(0x987944)" CDS complement(3236..4258) /translation="Bio::Annotation::SimpleValue=HASH(0x9be8e4)" CDS 4350..5936 /translation="Bio::Annotation::SimpleValue=HASH(0x9bead0)" CDS 6181..6819 /translation="Bio::Annotation::SimpleValue=HASH(0x9bebd8)" The translation tag I added is for some reason being shown as a hash. The code in question is here... my $translation = $seqo->subseq($start, $stop); $feat->add_tag_value("translation",$translation); Everything there is ok, as far as I can tell. I was able to spit the 'translation' tag back out with some test code just fine. Near as I can tell, either $feat->add_tag_value() is setting the tag value as a reference, or the tag value is being retreived as such when the feature is written to the seq object (or somewhere else in the process). Anyone have any idea what might be going on here? -Andrew Stewart BDRD From jason.stajich at duke.edu Thu Aug 25 17:32:17 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Aug 25 17:23:33 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <430E3350.50604@wam.umd.edu> References: <430E3350.50604@wam.umd.edu> Message-ID: It has been discussed several times on the mailing list. A deficiency in the code released as 1.5.0 but should be fixed in CVS now. If it isn't can you please yell loudly so it gets fixed by the people who broke it... =) http://portal.open-bio.org/pipermail/bioperl-l/2005-April/018749.html -jason On Aug 25, 2005, at 5:08 PM, Andrew Stewart wrote: > I'm trying to create genbank files from a sequence file and > features retreived from glimmer output. When creating the new > features and writing them to the genbank (richseq) object, though, > I get the following output (for example)... > > CDS 21..596 > /translation="Bio::Annotation::SimpleValue=HASH > (0x987b78)" > CDS complement(1713..2903) > /translation="Bio::Annotation::SimpleValue=HASH > (0x987944)" > CDS complement(3236..4258) > /translation="Bio::Annotation::SimpleValue=HASH > (0x9be8e4)" > CDS 4350..5936 > /translation="Bio::Annotation::SimpleValue=HASH > (0x9bead0)" > CDS 6181..6819 > /translation="Bio::Annotation::SimpleValue=HASH > (0x9bebd8)" > > The translation tag I added is for some reason being shown as a > hash. The code in question is here... > > my $translation = $seqo->subseq($start, $stop); > $feat->add_tag_value("translation",$translation); > > Everything there is ok, as far as I can tell. I was able to spit > the 'translation' tag back out with some test code just fine. Near > as I can tell, either $feat->add_tag_value() is setting the tag > value as a reference, or the tag value is being retreived as such > when the feature is written to the seq object (or somewhere else in > the process). > > Anyone have any idea what might be going on here? > > > -Andrew Stewart > BDRD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From indapa at gmail.com Thu Aug 25 17:36:10 2005 From: indapa at gmail.com (Amit Indap) Date: Thu Aug 25 17:26:02 2005 Subject: [Bioperl-l] DBI connection parameters and BasePersistenceAdaptor.pm Message-ID: <3cfaa40405082514364c4b5835@mail.gmail.com> Hi, Thanks for the response on the the bioperl-db API, Hilmar. I have a much better understanding now. I have a script that adds features to bioentries in a biosql database (namely where bioentries align to the human genome via parsinga blat file). But its having trouble connecting to my mysql db when I call my $dbseq= $adp->find_by_unique_key($seq); (where $seq holds my Bio::Seq object to which I want to add features to) The stack is listed at the end of the msg. I would like to add features to this sequence and then store them in my biosql database while encapsulating this process using bioperl-db API. Clearly, it can't connect to my mysql server The particular line in BasePersistenceAdaptor.pm it flames out on is: $dbh=$dbc->dbi()->get_connection($dbc,$dbc->dbi()->conn_params($self)) Elswhere in my code I have a low-level query for my biosql db using DBI in which I connect to mysql reading a .my.cnf file: my $conn = DBI->connect("DBI:mysql:amit" . ";mysql_read_default_file=/home/amit/.my.cnf", $user, $passwd); Is there a way for to tell bioperl to read this .my.cnf file when it makes its database connection? For some reason to open a mysql connection on my machine i need to open up a ssh -L connection to the machine where the mysql server lives with some funky parameters. (If this is more appropriate for biosql mailiing list, apologies but I didn't want to cross post :) Amit Indap Cornell University ------------- EXCEPTION ------------- MSG: failed to open connection: Access denied for user 'amit'@'132.236.170.104' (using password: NO) STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:253 STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:213 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477 STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BaseDriver.pm:515 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 STACK Bio::DB::BioSQL::PrimarySeqAdaptor::get_unique_key_query /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:395 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:845 STACK toplevel /home/amit/bin/HCG-blatparser.pl:181 From allenday at ucla.edu Thu Aug 25 17:43:46 2005 From: allenday at ucla.edu (Allen Day) Date: Thu Aug 25 17:33:08 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <430E3350.50604@wam.umd.edu> References: <430E3350.50604@wam.umd.edu> Message-ID: This is fixed in CVS. -Allen On Thu, 25 Aug 2005, Andrew Stewart wrote: > I'm trying to create genbank files from a sequence file and features > retreived from glimmer output. When creating the new features and > writing them to the genbank (richseq) object, though, I get the > following output (for example)... > > CDS 21..596 > > /translation="Bio::Annotation::SimpleValue=HASH(0x987b78)" > CDS complement(1713..2903) > > /translation="Bio::Annotation::SimpleValue=HASH(0x987944)" > CDS complement(3236..4258) > > /translation="Bio::Annotation::SimpleValue=HASH(0x9be8e4)" > CDS 4350..5936 > > /translation="Bio::Annotation::SimpleValue=HASH(0x9bead0)" > CDS 6181..6819 > > /translation="Bio::Annotation::SimpleValue=HASH(0x9bebd8)" > > The translation tag I added is for some reason being shown as a hash. > The code in question is here... > > my $translation = $seqo->subseq($start, $stop); > $feat->add_tag_value("translation",$translation); > > Everything there is ok, as far as I can tell. I was able to spit the > 'translation' tag back out with some test code just fine. Near as I can > tell, either $feat->add_tag_value() is setting the tag value as a > reference, or the tag value is being retreived as such when the feature > is written to the seq object (or somewhere else in the process). > > Anyone have any idea what might be going on here? > > > -Andrew Stewart > BDRD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mayagao1999 at yahoo.com Thu Aug 25 22:53:05 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Thu Aug 25 22:42:28 2005 Subject: [Bioperl-l] How to generate negative pair for motif binding sites in Perl? Message-ID: <20050826025305.44078.qmail@web53504.mail.yahoo.com> Hi, all! I have a problem in using Perl to make 100 negative pair for motif binding sites. Would anybody give me some suggestions? Thank you very much ahead of time. Alex The description of the problem: To generate negative dependent pair binding sites for motif 1. We can get 16 combinations of any 2 nucleotides. They are: AA, AT, AC, AG, TT, TC, TG, TA, CC, CT, CG, CA, GG, GC, GT and GA For example, if we say pair ??AA?? is a positive dependent pair, which means that ??A?? always comes with another ??A?? across many sequences with probability x%. In other words, it looks like: ?????????????? ????AA??????. ????AA??????. ????AA??????. ????AA??????. ????AA??????. ????AA??????. ????????????.. In contrast to positive pair, the negative pair ??AG?? looks like in some sequences: ????????????... ????A??????... ????A??????.. ????A??????... ????..G??????. ????..G??????. ????..G??????. ????????????... Which means that ??A?? is less likely to be with ??G?? across these sequences than other nucleotides G, T, C. But if we count the frequency of each nucleotide along the column, we can find that the ??A?? and ??G?? have the highest frequencies in its columns. By generating 4 negative pairs, we can end up with motif binding sites of length 8. Finally we are going to make 100 binding sites. 2. (1) Randomly pick 4 pairs from the 16 combinations which will be used as ??negative pairs?? in the sequences. For example, we get pairs AG, CT, CT, GG. (2) Suppose the probability for each negative pair is 70%. In the 100 binding sites, we let the all the 1st nucleotides be A with probability 70%. In other words, there are 70 As in the 100 binding sites on the 1st positions. If 1st position is A, then 2nd position will be G with probability 57% and A or C or T with probability (1-0.57)/3; If 1st position is not A, then let 2nd position be G automatically; (3) Repeat this for other three negative pairs. 3. Generally speaking, we have negative pair XY. (a) let 1st nucleotides in 100 sites be X with probability 70% and other with probability 10% (b) if 1st nucleotide = X, then let 2nd nucleotide in 100 sites be Y with probability 57% and other with probability (1-57%)/3; (c) Else, let 2nd nucleotide in 100 sites be Y automatically; (d) Repeat (a) (b) (c) for other three pairs. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From bmoore at genetics.utah.edu Fri Aug 26 01:15:04 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri Aug 26 01:03:09 2005 Subject: [Bioperl-l] RE: Bioperl-l Digest, Vol 28, Issue 10 Message-ID: Ping, I am sorry I don't understand your question. It looks like your e-mail might have contained a graphic or an attachment that I couldn't view. Do you want to try your question again, and send it to the bioperl list in case someone else understands it better than I do. Barry -----Original Message----- From: Ping Yao [mailto:sdshlxh@gmail.com] Sent: Wednesday, August 24, 2005 4:59 PM To: Barry Moore Subject: Re: Bioperl-l Digest, Vol 28, Issue 10 Hi,Barry: You gave very good suggetion. I am also a new user in perl. I try your code by myself and met the following problem. Stack toplevel in following my $blast_report = $factory->bl2seq($seq_obj, $input2); In fact I met the same Stack toplevel in other bioperl program . Could you give me some explain about it. How to fix Stack toplevel ? Ping YAO Univ. of Missouri-Columbia From hlapp at gmx.net Fri Aug 26 03:38:22 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Aug 26 03:35:40 2005 Subject: [Bioperl-l] 1.5.1 todo list In-Reply-To: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu> References: <4878D01B-50AC-4D20-B012-226C738D464C@duke.edu> Message-ID: On Aug 24, 2005, at 10:11 AM, Jason Stajich wrote: > [...] > What is the status of bioperl code for: > Ontology work The issues with the goflat parser and Ontology.pm loading Graph.pm should be fixed. Due to lack of time I haven't been able yet to iron out the last wrinkles with the goperl-bridge I wrote, so .obo format is not yet supported. I'd really like this to be in 1.6.x though. > BioSQL support (from the Core code at least, how much in sync would > 1.5.1 be with biosql-perl release?) I guess you mean bioperl-db? Bioperl-db works with the CVS main trunk of bioperl, all tests pass when run against bioperl-live. > Bio::FeatureIO stuff + Bio::SeqFeature changes? The overloads seem to work currently but generally make me feel uneasy because they can lead to very subtle and hard to track down bugs and should be earmarked for roll back. One could choose to ignore this though for 1.5.1 (as opposed to 1.6.x). > Bio::DB::GFF work? the GFF3 schema would be way past 1.5.1, but is > that something we'd want to shoot for in 1.6? > Other things? A comprehensive and authoritative set of tests for the SeqFeatureI API still needs to be written so that any future f*ups in this area are readily and immediately detected. This would then also be the set of tests that blesses (or holds up) the 1.6.0 release code. Again, although I gave it priority one could choose to ignore it for 1.5.1. -hilmar > > Please report in. Times like this sort of make me want a Wiki so we > can keep track but I'll at least volunteer to collate the results into > a summary email. > > > -jason > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Fri Aug 26 04:48:58 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri Aug 26 05:58:19 2005 Subject: [Bioperl-l] TypedSeqFeatureI is in; tests which fail In-Reply-To: <590c379d723a9438a5c760e0048e85a1@gmx.net> References: <430DEA45.1060308@ebi.ac.uk> <372DA67D-B453-429E-B200-16E33805A141@duke.edu> <430DF325.3020306@ebi.ac.uk> <590c379d723a9438a5c760e0048e85a1@gmx.net> Message-ID: <430ED77A.4090403@ebi.ac.uk> Hilmar Lapp wrote: > > On Aug 25, 2005, at 9:34 AM, Ewan Birney wrote: > >> Jason Stajich wrote: >> >>> They all pass for me on OSX, and linux. What version of perl? >> >> >> [Ewan-Birneys-Computer:wise2/src/network] birney% perl -v >> >> This is perl, v5.6.0 built for darwin >> >> Copyright 1987-2000, Larry Wall >> > > I do suggest you upgrade perl. I know 5.6.0 is the one that comes with > Jaguar, but it has bugs in some features bioperl is taking advantage of > (nested regex in FTLocationFactory being just one example). I've had so > much trouble to get Bioperl pass all tests with errors nobody else was > getting that I finally gave up and upgraded (to Panther actually, but > upgrading perl supposedly suffices). Once I did that most failures went > away (and some new ones came up but that's another story and they are > fixed meanwhile). > > I brought this up a while ago in spring and I can dig up the list thread > if you're interested. The conclusion was that essentially we'll have to > require perl 5.6.1 with the next release. > Ok. I will still dig to find out how many we can fix in 5.6.0 --- I am sure I wont be the last person to use it with Bioperl. > -hilmar From jason.stajich at duke.edu Fri Aug 26 11:52:30 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Aug 26 11:42:12 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <430F374B.7060408@wam.umd.edu> References: <430E3350.50604@wam.umd.edu> <430F1E81.6080400@wam.umd.edu> <44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu> <430F374B.7060408@wam.umd.edu> Message-ID: <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu> It is several files if I remember correctly not just one. I don't know exactly which ones. Better ask on the list as the folks who made the changes can say better. You CAN empirically figure out what has changed via some CVS trickery. CHECK OUT BIOPERL-LIVE FROM CVS $ cvs -d:pserver:cvs@cvs.open-bio.org:/home/repository/bioperl co bioperl-live $ cd bioperl-live SEE THE CHANGES THAT WERE MADE IN THE DIRECTORIES I THINK HAVE CHANGED $ cvs diff -r bioperl-release-1-5-0 Bio/SeqFeature $ cvs diff -r bioperl-release-1-5-0 Bio/Annotation $ cvs diff -r bioperl-release-1-5-0 Bio/SeqFeatureI.pm $ cvs diff -r bioperl-release-1-5-0 Bio/SeqIO You can use these commands to generate a patch file you can then apply to bioperl-1.5.0 -jason On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote: > Would it be possible to simply update the module which contains the > error (or are there multiple files?) rather than downgrade to 1.4 > or upgrade to the HEAD branch? > -Andrew > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gmx.net Fri Aug 26 12:24:37 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Aug 26 12:16:10 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu> References: <430E3350.50604@wam.umd.edu> <430F1E81.6080400@wam.umd.edu> <44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu> <430F374B.7060408@wam.umd.edu> <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu> Message-ID: <8c31cb4c665092173967567f87b35a34@gmx.net> > On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote: > >> Would it be possible to simply update the module which contains the >> error (or are there multiple files?) rather than downgrade to 1.4 or >> upgrade to the HEAD branch? >> -Andrew You could, e.g. using Jason's suggestion, but I don't know why you wouldn't just want to upgrade to the main trunk. Currently, this is as close as you can get to upgrading to 1.5.1., which is what you will want to do anyway immediately once it's out. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From qfdong at iastate.edu Fri Aug 26 15:26:30 2005 From: qfdong at iastate.edu (Qunfeng) Date: Fri Aug 26 15:18:47 2005 Subject: [Bioperl-l] bug report - SeqIO::genbank.pm In-Reply-To: <4171441D.6030502@utk.edu> References: <4171441D.6030502@utk.edu> Message-ID: <6.1.2.0.2.20050826140610.03f6f138@qfdong.mail.iastate.edu> Hi there, Sorry I am not sure where to report bioperl bug and whether this bug has been reported before. So I am just going to send it to the bioperl list. The "_read_GenBank_Species" function in SeqID::genbank.pm generates an exception when parsing GenBank record GI#66271013, which has an unusual ORGANISM name "(Populus tomentosa x P. bolleana) x P. tomentosa var. truncata". Notice there is a "(" in the beginning. That "(" will be treated as an opening "(" for regular expression (see line 8 below) and can be fixed by a simple escaping (see line 7 below). ================================================= sub _read_GenBank_Species{ ... elsif (/^\s{2}ORGANISM/o) { my @spflds = split(' ', $_); ($ns_name) = $_ =~ /\w+\s+(.*)/o; shift(@spflds); # ORGANISM $spflds[0] =~ s/\(/\\\(/; #(7) escape the ( by \( if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) { #(8) it causes exception with "(Populus" $organelle = shift(@spflds); } ... } ================================================= Qunfeng From brian_osborne at cognia.com Fri Aug 26 15:55:34 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 26 15:45:02 2005 Subject: [Bioperl-l] bug report - SeqIO::genbank.pm In-Reply-To: <6.1.2.0.2.20050826140610.03f6f138@qfdong.mail.iastate.edu> Message-ID: Qunfeng, In the future please submit bugs at http://bugzilla.bioperl.org/. Right now I'll just take a look at this without a formal bug report, thanks for the submission. Brian O. On 8/26/05 3:26 PM, "Qunfeng" wrote: > Hi there, > > Sorry I am not sure where to report bioperl bug and whether this bug has > been reported before. So I am just going to send it to the bioperl list. > > The "_read_GenBank_Species" function in SeqID::genbank.pm generates an > exception when parsing GenBank record GI#66271013, which has an unusual > ORGANISM name "(Populus tomentosa x P. bolleana) x P. tomentosa var. > truncata". Notice there is a "(" in the beginning. That "(" will be > treated as an opening "(" for regular expression (see line 8 below) and can > be fixed by a simple escaping (see line 7 below). > > > ================================================= > sub _read_GenBank_Species{ ... > elsif (/^\s{2}ORGANISM/o) { > my @spflds = split(' ', $_); > ($ns_name) = $_ =~ /\w+\s+(.*)/o; > shift(@spflds); # ORGANISM > $spflds[0] =~ > s/\(/\\\(/; #(7) escape the ( by \( > if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) { #(8) it > causes exception with "(Populus" > $organelle = shift(@spflds); > } > ... > } > ================================================= > > Qunfeng > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From astew at wam.umd.edu Fri Aug 26 16:08:23 2005 From: astew at wam.umd.edu (Andrew Stewart) Date: Fri Aug 26 15:58:15 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <8c31cb4c665092173967567f87b35a34@gmx.net> References: <430E3350.50604@wam.umd.edu> <430F1E81.6080400@wam.umd.edu> <44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu> <430F374B.7060408@wam.umd.edu> <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu> <8c31cb4c665092173967567f87b35a34@gmx.net> Message-ID: <430F76B7.7010803@wam.umd.edu> Do many of you use bioperl-live as your primary (or exclusive) BioPerl distribution, or do you keep a stable version as well? I'd like to use bioperl-live for instances such as this (see message history), but not necessarily for when I'm developing scripts that are going to be used by others in my lab who do not necessarily have bioperl-live installed. What I'm thinking is that I should maybe install a copy of bioperl-live somewhere in my personal space, and then 'use' it in certain scripts when needed. I just have a few questions (these are probably more 'perl' questions than 'bio-perl' questions)... 1. Once I obtain bioperl-live via 'cvs -d :pserver etc...', do I need to actually go through the install routine or can I just access the modules from where they are downloaded? and 2. Would I then place code at the header of my script such as... use lib "/path/to/bioperl-live"; use MODULE; and the updated module will (temporarily) override the other bioperl modules in my @INC? I tried this, actually, without any noticable change in my previous problem (_print_GenBank... still prints the feature tabs as /tab="Bio::Annotation::SimpleValue=HASH(0x87f0914)" instead of /tab="value"), but I don't know for certain if perl was using the modules from my bioperl-live installation or the older ones. -Andrew Stewart Hilmar Lapp wrote: > >> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote: >> >>> Would it be possible to simply update the module which contains the >>> error (or are there multiple files?) rather than downgrade to 1.4 or >>> upgrade to the HEAD branch? >>> -Andrew >> > > You could, e.g. using Jason's suggestion, but I don't know why you > wouldn't just want to upgrade to the main trunk. Currently, this is as > close as you can get to upgrading to 1.5.1., which is what you will > want to do anyway immediately once it's out. > > -hilmar > From fiedler at cshl.edu Thu Aug 25 18:18:27 2005 From: fiedler at cshl.edu (Tristan Fiedler) Date: Fri Aug 26 16:26:16 2005 Subject: [Bioperl-l] DocBook Question In-Reply-To: <200508252133.j7PLXlTu005861@portal.open-bio.org> References: <200508252133.j7PLXlTu005861@portal.open-bio.org> Message-ID: <4176cd6f03d312ccfa4ba37508c71ee2@cshl.edu> Hi Barry, I am using DocBook for a project (http://www.WormBook.org ) based on the following software pipeline : DocBook to HTML : Saxon 6.5.3 FO to PDF : FOP 0.20.5 XML : DocBook XML 4.4CR2 XSL : DocBook XSL 1.67.2 These all work quite well together, although FOP is getting outdated. I would recommend trying the new RenderX instead of FOP. For page margins, check out : http://www.sagehill.net/docbookxsl/PrintOutput.html#LeftRightMargins http://www.sagehill.net/docbookxsl/PageDesign.html Cheers, Tristan --- Tristan J. Fiedler Postdoctoral Fellow - Stein Lab Cold Spring Harbor Laboratory From hlapp at gmx.net Fri Aug 26 16:37:26 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Aug 26 16:27:32 2005 Subject: [Bioperl-l] DBI connection parameters In-Reply-To: <3cfaa40405082514364c4b5835@mail.gmail.com> References: <3cfaa40405082514364c4b5835@mail.gmail.com> Message-ID: On Aug 25, 2005, at 2:36 PM, Amit Indap wrote: > [...] But its having trouble connecting to my mysql db when I call > my $dbseq= $adp->find_by_unique_key($seq); > (where $seq holds my Bio::Seq object to which I want to add features > to) The stack is listed at the end of the msg. > The stack says: > MSG: failed to open connection: Access denied for user > 'amit'@'132.236.170.104' (using password: NO) Can you connect using the mysql shell as the above user from machine 132.236.170.104 without using a password? You supply the password to BioDB->new() using the -pass option. > [...] > Elswhere in my code I have a low-level query for my biosql db using > DBI in which I connect to mysql reading a .my.cnf file: > > my $conn = DBI->connect("DBI:mysql:amit" . > ";mysql_read_default_file=/home/amit/.my.cnf", $user, $passwd); > > Is there a way for to tell bioperl to read this .my.cnf file when it > makes its database connection? No, not until now. I added an option (-dsn) that lets you specify the dsn to be used verbatim for connecting. It should propagate to the anonymous cvs server over the next 1-2 hours. You can now also specify this option (--dsn) to load_{seqdatabase,ontology}.pl. There is also an option -initrc that lets you specify a file that evaluates to a hash ref with all the parameters as keys. Check out the POD for Bio::DB::BioDB->new(). I also exposed this option (--initrc) now in load_{seqdatabase,ontology}.pl, apparently I had forgotten to do this before. -hilmar From the respective POD section I wrote on --initrc: --initrc paramfile Instead of, or in addition to, specifying every individual database connection parameter you may put them into a file that when read by perl evaluates to an array or hash reference. This option specifies the file to read; the special value DEFAULT (or no value) will use a file ./.bioperldb or $HOME/.bioperldb, whichever is found first in that order. Constructing a file that evaluates to a hash reference is very sim- ple. The first non-space character needs to be an open curly brace, and the last non-space character a closing curly brace. In between the curly braces, write option name, followed by => (equal to or greater than), followed by the value in single quotes. Separate each such option/value pair by comma. Here is an example: { -dbname => 'mybiosql', -host => 'foo.bar.edu', -user => 'cleo' } Line breaks and white space don't matter (except if in the value itself). Also note that options only have a single dash as prefix, and they need to be those accepted by Bio::DB::BioDB->new() (Bio::DB::BioDB) or Bio::DB::SimpleDBContext->new() (Bio::DB::Sim- pleDBContext). Those sometimes differ slightly from the option names used by this script, e.g., --dbuser corresponds to -user. Note also that using the above example, you can use it for --initrc and still connect as user caesar by also supplying --dbuser caesar on the command line. I.e., command line arguments override any parame- ters also found in the initrc file. Finally, note that if using this option with default file name and the default file is not found at any of the default locations, the option will be ignored; it is not considered an error. > For some reason to open a mysql > connection on my machine i need to open up a ssh -L connection to the > machine where the mysql server lives with some funky parameters. (If > this is more appropriate for biosql mailiing list, apologies but I > didn't want to cross post :) > > Amit Indap > Cornell University > > ------------- EXCEPTION ------------- > MSG: failed to open connection: Access denied for user > 'amit'@'132.236.170.104' (using password: NO) > STACK Bio::DB::DBI::base::new_connection > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:253 > STACK Bio::DB::DBI::base::get_connection > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/DBI/base.pm:213 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:1477 > STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BaseDriver.pm:515 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:927 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::PrimarySeqAdaptor::get_unique_key_query > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:395 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:845 > STACK toplevel /home/amit/bin/HCG-blatparser.pl:181 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From wes.barris at csiro.au Fri Aug 26 00:37:42 2005 From: wes.barris at csiro.au (Wes Barris) Date: Fri Aug 26 16:30:17 2005 Subject: [Bioperl-l] How do I tell what version of bioperl is installed? Message-ID: <430E9C96.3080601@csiro.au> Hi, I am trying to install gbrowse which requires bioperl-1.5. I am getting a warning from the gbrowse installation that says: Warning: prerequisite Bio::Perl 1.5 not found. We have unknown version. The thing is that I have bioperl-1.5 installed. How do I verify this? Normally, I use this script to list installed modules and their versions but it does not report a version for bioperl: #!/usr/bin/perl use ExtUtils::Installed; my $instmod = ExtUtils::Installed->new(); foreach my $module ($instmod->modules()) { my $version = $instmod->version($module) || "???"; print "$module -- $version\n"; } wes@bioweb> ~/proj/perl/installed.pl Authen::Krb5::Simple -- 0.31 Bio -- ??? GD -- 2.19 GD::SVG -- 0.25 Generic-Genome-Browser -- ??? HTTPD-User-Manage -- ??? IO::String -- 1.06 MD5 -- 2.03 Perl -- 5.8.5 SVG -- 2.32 SynBrowse -- ??? Text::Shellwords -- 1.07 mod_perl -- 1.29 -- Wes Barris E-Mail: Wes.Barris@csiro.au From jason.stajich at duke.edu Fri Aug 26 17:02:12 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Aug 26 16:51:39 2005 Subject: [Bioperl-l] How do I tell what version of bioperl is installed? In-Reply-To: <430E9C96.3080601@csiro.au> References: <430E9C96.3080601@csiro.au> Message-ID: <6D9F04B3-1D30-4BFA-93FF-2122DAD08D70@duke.edu> I think the warning is extraneous and is something lincoln later fixed in what gbrowse is parsing. It wasn't properly detecting the version. you can tell by doing this for any module ( Bio::SeqIO for example) $ perl -MBio::SeqIO -e 'print "$Bio::SeqIO::VERSION\n";' But this is a runtime thing while MakeMaker is actually parsing the file to try and figure out the version which doesn't quite work. I believe he posted about a workaround and was updating the gbrowse code to be able to handle it. The thread starts here: http://portal.open-bio.org/pipermail/bioperl-l/2005-August/019495.html -jason On Aug 26, 2005, at 12:37 AM, Wes Barris wrote: > Hi, > > I am trying to install gbrowse which requires bioperl-1.5. I am > getting > a warning from the gbrowse installation that says: > > Warning: prerequisite Bio::Perl 1.5 not found. We have unknown > version. > > The thing is that I have bioperl-1.5 installed. How do I verify this? > Normally, I use this script to list installed modules and their > versions > but it does not report a version for bioperl: > > #!/usr/bin/perl > use ExtUtils::Installed; > my $instmod = ExtUtils::Installed->new(); > foreach my $module ($instmod->modules()) { > my $version = $instmod->version($module) || "???"; > print "$module -- $version\n"; > } > > wes@bioweb> ~/proj/perl/installed.pl > Authen::Krb5::Simple -- 0.31 > Bio -- ??? > GD -- 2.19 > GD::SVG -- 0.25 > Generic-Genome-Browser -- ??? > HTTPD-User-Manage -- ??? > IO::String -- 1.06 > MD5 -- 2.03 > Perl -- 5.8.5 > SVG -- 2.32 > SynBrowse -- ??? > Text::Shellwords -- 1.07 > mod_perl -- 1.29 > > -- > Wes Barris > E-Mail: Wes.Barris@csiro.au > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gnf.org Fri Aug 26 17:21:46 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Aug 26 17:13:22 2005 Subject: [Bioperl-l] ontology paths in Bioperl-DB / Biosql Message-ID: <7fdc960cb1f74a59d2014129a38bceb6@gnf.org> One thing I forgot to report to the list is that last Friday I fixed the Bioperl-db adaptor and driver module for ontology paths in Biosql to include the distance zero paths when computing the transitive closure over an ontology. There are now also tests in t/12ontology.t that check for those distance zero paths. They pass on all three supported platforms (mysql, Pg, Oracle). (load_ontology.pl in bioperl-db/scripts/biosql has an option --computetc that if supplied will automatically recompute the transitive closure over the just loaded ontology) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From limericksean at gmail.com Mon Aug 29 11:03:01 2005 From: limericksean at gmail.com (Sean O'Keeffe) Date: Mon Aug 29 10:52:17 2005 Subject: [Bioperl-l] Bio::Search results Message-ID: <46278464050829080368d707be@mail.gmail.com> Hi, The following code snippet is something I use to extract information from hmmer result files: use Bio::SearchIO; my $in = new Bio::SearchIO( -format => 'hmmer', -file => $ARGV[0] ); while(my $result = $in->next_result) { print $result->query_name(), "\n",$result->query_description(),"\n"; while (my $hit = $result->next_hit) { while(my $hsp = $hit->next_domain) { next unless ($hsp->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i); print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n"; } } } The input file is generated by hmmpfam and is given at the command line. I use it to scan for specific domain names e.g ig, fn3 lrr etc. This code works for the first loop and then ends so I get the name and description (no hsp values as their are none for this result): ENSMUSP00000065602 pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059 transcript:ENSMUST00000066517 My question is why does the loop end after one instance. Incidentally the outputted name and description above are the last ones in the hmmer file (maybe the file is read from the back? - don't know if this means anything). Any thoughts would be appreciated. Thanks, Sean. From jason.stajich at duke.edu Mon Aug 29 12:18:03 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Aug 29 12:07:45 2005 Subject: [Bioperl-l] If you use RemoteBlast References: <69BA0F938FAC6A4CBEF49461720696F20A6B6200@nihexchange16.nih.gov> Message-ID: <78ADB5F4-31DB-4FEA-8515-CBC403B6A1FC@duke.edu> So those of you who use the Tools::RemoteBlast module, please read the following email. We need some people to test out how much the parser breaks with the "new" formatter. Tests seem to pass right now, but I don't know if that is because the 'old' format is being requested still. Could someone please take a little time to see what's going on and report back. Thanks, -jason Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI)" > Date: August 29, 2005 12:06:52 PM EDT > To: "'jason@bioperl.org'" > Subject: New BLAST Formatter. > > > Hello. > > The new BLAST formatter has been a default for a months now. But > we'd like > to shut off the old one. > > Will this pose a problem? > > Thanks, > > Sincerely, > Scott D. McGinnis, M.S. > National Center for Biotechnology Information > > > > Blast-announce: New BLAST formatter at the NCBI > > A new version of the BLAST formatter has been the default on the > NCBI BLAST > web pages for the past XX months. On September 6, 2005 we will > remove the > checkbox allowing users to select the old formatter and support for > the old > formatter will be discontinued. > > This formatter has been rewritten from scratch using the NCBI C++ > toolkit > and includes many new features (see list below) as well as the > ability to > fetch parts of genomic sequences when needed, making it much faster > than the > old formatter for many queries. > > Please send questions or comments to blast-help@ncbi.nlm.nih.gov > > > New features: > -------------- > > 1.) The new formatter will present the masked residues or bases as > lower-case letters. Additionally the masked letters can be shown > in color. > To use this feature change the "Masking Character" to "Lower case" > on the > formatting page and select a "Masking Color". > Example: > > http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi? > CMD=Get&RID=1098448824-15725-370 > 06897750.BLASTQ4&NEW_FORMATTER=on&MASK_CHAR=2&MASK_COLOR=2&DESCRIPTION > S=0 > > > 2.) The "pairwise with identities" option allows easy > identification of a > few mismatches among highly similar sequences. In this (pair-wise) > view > mismatches, as well as "Sbjct" (on the line containing the > mismatch) are > shown in red. > To use this feature change the "Alignment view" to "Pairwise with > identities" on the formatting page. > Example: > > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE > W=on&ALIGNMENT_VIEW=PairwiseWithIdentities&NEW_FORMATTER=on&RID=111089 > 2196-1 > 6209-7903412953.BLASTQ4#28302128 > > > 3.) For database sequences longer than 200,000 bases each alignment > has a > header entitled "Features in this part of the subject sequence" > listing CDS features on the database sequence within the alignment > range or > at the 5' or 3' end if not features are within the range itself. > This gives a quick description of what you are looking at as many long > sequences have a standard defline such as "chromosome 16". > Example: > > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE > W=on&NEW_FORMATTER=on&RID=1098455471-18771-167762343145.BLASTQ4#514656 > 96 > > > Rewrites/bug fixes: > ------------------- > > 1.) The graphic overview has been rewritten; it now uses an HTML > implementation. > > 2.) Query-anchored views now work with blastx/tblastn/tblastx, they > didn't > before. > > 3.) phi-BLAST patterns are now also shown in the query-anchored view. > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cain at cshl.edu Mon Aug 29 12:34:58 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Aug 29 12:24:39 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] GTF-->GFF3 converter In-Reply-To: <3a8da45805082614141e8fc27@mail.gmail.com> References: <3a8da45805082614141e8fc27@mail.gmail.com> Message-ID: <1125333298.2882.33.camel@localhost.localdomain> Hi Etienne, Probably the best mailing list to ask this question on is the bioperl mailing list (cc'ed here). As far as I know, there is no script specifically to do that. Because GFF3 is more strict than GTF (aka GFF 2.5), it can be difficult to move from GTF to GFF3. If the Bio::FeatureIO::gff module were a little more fleshed out, it would probably be able to do it, but currently, while it will write GTF, it doesn't yet read it. If you wanted to contribute code to do that, that would be great. The other possibility in the absence of Bio::FeatureIO::gff is Bio::Tools::GFF, which should be able to parse GTF and then write something resembling GFF3. I wrote 'resembling' because you may need to massage the output to actually get something that is GFF3. Scott On Fri, 2005-08-26 at 15:14 -0600, Etienne Noumen wrote: > Hi, > In our projects, our data are in GTF format. I wrote a script to > convert it to GFF3 but there are tags like Feature ID, ProteinID that > i don't know how to deal with. I am also concerned about grouping > exons and CDS into mRNA and Genes. Is there any converter that does it > well? > > This is how my files look like: > ............ > scaffold_10034 src exon 7360 8354 . - . name > "fgenesh1_pg.C_scaffold_10034000001"; transcriptId 58482 > scaffold_10034 src CDS 7360 8352 . - 0 name > "fgenesh1_pg.C_scaffold_10034000001"; proteinId 58482; exonNumber 1 > scaffold_10034 src stop_codon 7360 7362 . - 0 name > "fgenesh1_pg.C_scaffold_10034000001" > scaffold_10309 src exon 5822 6042 . + . name > "fgenesh1_pg.C_scaffold_10309000001"; transcriptId 58526 > scaffold_10309 src CDS 5822 6042 . + 0 name > "fgenesh1_pg.C_scaffold_10309000001"; proteinId 58526; exonNumber 1 > scaffold_10309 src exon 7270 7612 . + . name > "fgenesh1_pg.C_scaffold_10309000001"; transcriptId 58526 > scaffold_10309 src CDS 7270 7612 . + 2 name > "fgenesh1_pg.C_scaffold_10309000001"; proteinId 58526; exonNumber 2 > scaffold_10309 src stop_codon 7610 7612 . + 0 name > "fgenesh1_pg.C_scaffold_10309000001" > ........... > Thank you. > noumen > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From aleunpkc at gmail.com Sun Aug 28 20:58:27 2005 From: aleunpkc at gmail.com (hong kong pm) Date: Mon Aug 29 14:07:24 2005 Subject: [Bioperl-l] [OT] General bioinformatics forums/lists? Message-ID: <7e4bd6e6050828175846474aa3@mail.gmail.com> Do we need a bioinformatics forum though bioperl mailing list is already a good one? If we want to establish one, we need volunteers to work as forum moderators. I am willing to sponsor h/w, software license, and hosting if there is enough interest. Andy From palmeida at igc.gulbenkian.pt Mon Aug 29 14:31:53 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Mon Aug 29 14:22:34 2005 Subject: [Bioperl-l] If you use RemoteBlast In-Reply-To: <78ADB5F4-31DB-4FEA-8515-CBC403B6A1FC@duke.edu> References: <69BA0F938FAC6A4CBEF49461720696F20A6B6200@nihexchange16.nih.gov> <78ADB5F4-31DB-4FEA-8515-CBC403B6A1FC@duke.edu> Message-ID: <200508291931.54147.palmeida@igc.gulbenkian.pt> I think it is indeed the new formatter that is being requested by Tools::RemoteBlast, since it is the default and Tools::RemoteBlast doesn't seem to change it (I'm using BioPerl 1.4). There is this checkbox in Blast.cgi that controls this: I don't know if there are more complex ways in which the new formatter may break the parser, but I've been using Tools::RemoteBlast and didn't notice anything weird (my code is pretty much the same as the Synopsis). -- Paulo On Monday 29 August 2005 17:18, Jason Stajich wrote: > So those of you who use the Tools::RemoteBlast module, please read > the following email. We need some people to test out how much the > parser breaks with the "new" formatter. Tests seem to pass right now, > but I don't know if that is because the 'old' format is being > requested still. Could someone please take a little time to see > what's going on and report back. > > Thanks, > -jason > > Begin forwarded message: > > From: "Mcginnis, Scott (NIH/NLM/NCBI)" > > Date: August 29, 2005 12:06:52 PM EDT > > To: "'jason@bioperl.org'" > > Subject: New BLAST Formatter. > > > > > > Hello. > > > > The new BLAST formatter has been a default for a months now. But > > we'd like > > to shut off the old one. > > > > Will this pose a problem? > > > > Thanks, > > > > Sincerely, > > Scott D. McGinnis, M.S. > > National Center for Biotechnology Information > > > > > > > > Blast-announce: New BLAST formatter at the NCBI > > > > A new version of the BLAST formatter has been the default on the > > NCBI BLAST > > web pages for the past XX months. On September 6, 2005 we will > > remove the > > checkbox allowing users to select the old formatter and support for > > the old > > formatter will be discontinued. > > > > This formatter has been rewritten from scratch using the NCBI C++ > > toolkit > > and includes many new features (see list below) as well as the > > ability to > > fetch parts of genomic sequences when needed, making it much faster > > than the > > old formatter for many queries. > > > > Please send questions or comments to blast-help@ncbi.nlm.nih.gov > > > > > > New features: > > -------------- > > > > 1.) The new formatter will present the masked residues or bases as > > lower-case letters. Additionally the masked letters can be shown > > in color. > > To use this feature change the "Masking Character" to "Lower case" > > on the > > formatting page and select a "Masking Color". > > Example: > > > > http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi? > > CMD=Get&RID=1098448824-15725-370 > > 06897750.BLASTQ4&NEW_FORMATTER=on&MASK_CHAR=2&MASK_COLOR=2&DESCRIPTION > > S=0 > > > > > > 2.) The "pairwise with identities" option allows easy > > identification of a > > few mismatches among highly similar sequences. In this (pair-wise) > > view > > mismatches, as well as "Sbjct" (on the line containing the > > mismatch) are > > shown in red. > > To use this feature change the "Alignment view" to "Pairwise with > > identities" on the formatting page. > > Example: > > > > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE > > W=on&ALIGNMENT_VIEW=PairwiseWithIdentities&NEW_FORMATTER=on&RID=111089 > > 2196-1 > > 6209-7903412953.BLASTQ4#28302128 > > > > > > 3.) For database sequences longer than 200,000 bases each alignment > > has a > > header entitled "Features in this part of the subject sequence" > > listing CDS features on the database sequence within the alignment > > range or > > at the 5' or 3' end if not features are within the range itself. > > This gives a quick description of what you are looking at as many long > > sequences have a standard defline such as "chromosome 16". > > Example: > > > > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE > > W=on&NEW_FORMATTER=on&RID=1098455471-18771-167762343145.BLASTQ4#514656 > > 96 > > > > > > Rewrites/bug fixes: > > ------------------- > > > > 1.) The graphic overview has been rewritten; it now uses an HTML > > implementation. > > > > 2.) Query-anchored views now work with blastx/tblastn/tblastx, they > > didn't > > before. > > > > 3.) phi-BLAST patterns are now also shown in the query-anchored view. > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Paulo Almeida Tel: +351 21 4464635, Fax: +351 21 4407970 Instituto Gulbenkian de Ci?ncia Rua da Quinta Grande, 6 P-2780-156 Oeiras Portugal http://www.igc.gulbenkian.pt From boris.steipe at utoronto.ca Mon Aug 29 14:35:56 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Mon Aug 29 14:27:46 2005 Subject: [Bioperl-l] [OT] General bioinformatics forums/lists? In-Reply-To: <7e4bd6e6050828175846474aa3@mail.gmail.com> References: <7e4bd6e6050828175846474aa3@mail.gmail.com> Message-ID: Would the Bio_Bulletin_Board not work for you? see: http://bioinformatics.org/mailman/listinfo/ B. On 28 Aug 2005, at 20:58, hong kong pm wrote: > Do we need a bioinformatics forum though bioperl mailing list is > already a > good one? If we want to establish one, we need volunteers to work > as forum > moderators. I am willing to sponsor h/w, software license, and > hosting if > there is enough interest. > Andy > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Mon Aug 29 15:20:50 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 29 15:12:45 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <430F76B7.7010803@wam.umd.edu> References: <430E3350.50604@wam.umd.edu> <8c31cb4c665092173967567f87b35a34@gmx.net> <430F76B7.7010803@wam.umd.edu> Message-ID: <200508291520.50822.lstein@cshl.edu> Hi, I have an environment variable in my .cshrc as follows: setenv PERL5LIB $HOME/projects/bioperl-live Lincoln On Friday 26 August 2005 04:08 pm, Andrew Stewart wrote: > Do many of you use bioperl-live as your primary (or exclusive) BioPerl > distribution, or do you keep a stable version as well? > > I'd like to use bioperl-live for instances such as this (see message > history), but not necessarily for when I'm developing scripts that are > going to be used by others in my lab who do not necessarily have > bioperl-live installed. > > What I'm thinking is that I should maybe install a copy of bioperl-live > somewhere in my personal space, and then 'use' it in certain scripts > when needed. I just have a few questions (these are probably more > 'perl' questions than 'bio-perl' questions)... > > 1. Once I obtain bioperl-live via 'cvs -d :pserver etc...', do I need to > actually go through the install routine or can I just access the modules > from where they are downloaded? > > and > > 2. Would I then place code at the header of my script such as... > > use lib "/path/to/bioperl-live"; > use MODULE; > > and the updated module will (temporarily) override the other bioperl > modules in my @INC? > > I tried this, actually, without any noticable change in my previous problem > (_print_GenBank... still prints the feature tabs as > /tab="Bio::Annotation::SimpleValue=HASH(0x87f0914)" instead of > /tab="value"), but I don't know for certain if perl was using the > modules from my bioperl-live installation or the older ones. > > > -Andrew Stewart > > Hilmar Lapp wrote: > >> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote: > >>> Would it be possible to simply update the module which contains the > >>> error (or are there multiple files?) rather than downgrade to 1.4 or > >>> upgrade to the HEAD branch? > >>> -Andrew > > > > You could, e.g. using Jason's suggestion, but I don't know why you > > wouldn't just want to upgrade to the main trunk. Currently, this is as > > close as you can get to upgrading to 1.5.1., which is what you will > > want to do anyway immediately once it's out. > > > > -hilmar > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From hlapp at gmx.net Mon Aug 29 15:48:30 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Aug 29 15:37:46 2005 Subject: [Bioperl-l] ->add_tag_value() In-Reply-To: <430F76B7.7010803@wam.umd.edu> References: <430E3350.50604@wam.umd.edu> <430F1E81.6080400@wam.umd.edu> <44E35FE4-D25D-4D96-A195-12D1DCF85638@duke.edu> <430F374B.7060408@wam.umd.edu> <6F0A7CDE-DFFA-499C-A3EE-CECE62FE9A2A@duke.edu> <8c31cb4c665092173967567f87b35a34@gmx.net> <430F76B7.7010803@wam.umd.edu> Message-ID: <9185c308932848f92b4a2f905222b1cb@gmx.net> On Aug 26, 2005, at 1:08 PM, Andrew Stewart wrote: > Do many of you use bioperl-live as your primary (or exclusive) BioPerl > distribution, or do you keep a stable version as well? > I used to live off of bioperl-live and update regularly but I stopped updating about a year ago, so technically for production I'm on something close to 1.4.1. I do keep a cvs head version too, for developing / fixing. > I'd like to use bioperl-live for instances such as this (see message > history), but not necessarily for when I'm developing scripts that are > going to be used by others in my lab who do not necessarily have > bioperl-live installed. > > What I'm thinking is that I should maybe install a copy of > bioperl-live somewhere in my personal space, and then 'use' it in > certain scripts when needed. What I do is I don't install Bioperl in order to avoid any precedence order mistakes for library paths. Since there is no compiled code (unless you also use bioperl-ext) you can just point PERL5LIB at the root of the Bioperl installation you want to work with, and if there is none in the standard @INC you can be sure which modules will be loaded. > I just have a few questions (these are probably more 'perl' questions > than 'bio-perl' questions)... > > 1. Once I obtain bioperl-live via 'cvs -d :pserver etc...', do I need > to actually go through the install routine or can I just access the > modules from where they are downloaded? No and yes, respectively. See above. > > and > > 2. Would I then place code at the header of my script such as... > > use lib "/path/to/bioperl-live"; > use MODULE; > > and the updated module will (temporarily) override the other bioperl > modules in my @INC? > > I tried this, actually, without any noticable change in my previous > problem > (_print_GenBank... still prints the feature tabs as > /tab="Bio::Annotation::SimpleValue=HASH(0x87f0914)" instead of > /tab="value"), but I don't know for certain if perl was using the > modules from my bioperl-live installation or the older ones. > I'm not sure about the search order. The POD for lib says: It is typically used to add extra directories to perl's search path so that later "use" or "require" statements will find modules which are not located on perl's default search path. but also: The parameters to "use lib" are added to the start of the perl search path. Saying use lib LIST; is almost the same as saying BEGIN { unshift(@INC, LIST) } Note you can easily test which version is loaded by using either the debugger or add some garbage to the module. -hilmar > > -Andrew Stewart > > > Hilmar Lapp wrote: > >> >>> On Aug 26, 2005, at 11:37 AM, Andrew Stewart wrote: >>> >>>> Would it be possible to simply update the module which contains the >>>> error (or are there multiple files?) rather than downgrade to 1.4 >>>> or upgrade to the HEAD branch? >>>> -Andrew >>> >> >> You could, e.g. using Jason's suggestion, but I don't know why you >> wouldn't just want to upgrade to the main trunk. Currently, this is >> as close as you can get to upgrading to 1.5.1., which is what you >> will want to do anyway immediately once it's out. >> >> -hilmar >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lstein at cshl.edu Mon Aug 29 17:43:30 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Aug 29 17:33:34 2005 Subject: [Bioperl-l] Re: Bio::DB::GFF start/end coordinates In-Reply-To: References: Message-ID: <200508291743.30620.lstein@cshl.edu> Hi Jason, You've got to set $db->absolute(1) to get true Bioperl-compliant coordinates. The reason for this is because of Bio::DB::GFF's (perhaps regrettable) use of relative coordinate addressing by default. This is explicitly mentioned in the documentation under a section named (something like) "BioPerl compliance." Lincoln On Thursday 25 August 2005 01:00 pm, Jason Stajich wrote: > Lincoln - > > One bug I'm still seeing in Bio::DB::GFF::Feature objects is start/ > end are still returning start > end when strand < 0. I know this is > different expectation for Bioperl / Gbrowse but this causes a little > problems, especially when you get an aggregated feature out from > Bio::DB:GFF and then write it to a genbank file. The locations looks > like this: > complement(join(1031..975,676..501)) > > My workaround is just to create new Location objects and features > from the Bio::DB::GFF obtained objects (some of these aren't > allowing write-back to overwrite the values). > > Note on a slightly separate topic: > I have patched my Bio::Location::Split to_FTstring to simplify the > string, current behavior would be to output the location like this: > join(complement(1031..975),complement(676..501),)) > > I'm seeing about how applying the patch, I'm not sure whether or not > it perfectly works. > > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse@cshl.edu From bmoore at genetics.utah.edu Mon Aug 29 18:13:56 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon Aug 29 18:00:54 2005 Subject: [Bioperl-l] If you use RemoteBlast Message-ID: Jason- I stepped through the code in Bio::Tools::Run::RemoteBlast::submit_blast, and bioperl is using the default new formatter, and for the dozen or so nucleotide sequences that I ran no problems parsing. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: Monday, August 29, 2005 10:18 AM To: Bioperl List Subject: [Bioperl-l] If you use RemoteBlast So those of you who use the Tools::RemoteBlast module, please read the following email. We need some people to test out how much the parser breaks with the "new" formatter. Tests seem to pass right now, but I don't know if that is because the 'old' format is being requested still. Could someone please take a little time to see what's going on and report back. Thanks, -jason Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI)" > Date: August 29, 2005 12:06:52 PM EDT > To: "'jason@bioperl.org'" > Subject: New BLAST Formatter. > > > Hello. > > The new BLAST formatter has been a default for a months now. But > we'd like > to shut off the old one. > > Will this pose a problem? > > Thanks, > > Sincerely, > Scott D. McGinnis, M.S. > National Center for Biotechnology Information > > > > Blast-announce: New BLAST formatter at the NCBI > > A new version of the BLAST formatter has been the default on the > NCBI BLAST > web pages for the past XX months. On September 6, 2005 we will > remove the > checkbox allowing users to select the old formatter and support for > the old > formatter will be discontinued. > > This formatter has been rewritten from scratch using the NCBI C++ > toolkit > and includes many new features (see list below) as well as the > ability to > fetch parts of genomic sequences when needed, making it much faster > than the > old formatter for many queries. > > Please send questions or comments to blast-help@ncbi.nlm.nih.gov > > > New features: > -------------- > > 1.) The new formatter will present the masked residues or bases as > lower-case letters. Additionally the masked letters can be shown > in color. > To use this feature change the "Masking Character" to "Lower case" > on the > formatting page and select a "Masking Color". > Example: > > http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi? > CMD=Get&RID=1098448824-15725-370 > 06897750.BLASTQ4&NEW_FORMATTER=on&MASK_CHAR=2&MASK_COLOR=2&DESCRIPTION > S=0 > > > 2.) The "pairwise with identities" option allows easy > identification of a > few mismatches among highly similar sequences. In this (pair-wise) > view > mismatches, as well as "Sbjct" (on the line containing the > mismatch) are > shown in red. > To use this feature change the "Alignment view" to "Pairwise with > identities" on the formatting page. > Example: > > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE > W=on&ALIGNMENT_VIEW=PairwiseWithIdentities&NEW_FORMATTER=on&RID=111089 > 2196-1 > 6209-7903412953.BLASTQ4#28302128 > > > 3.) For database sequences longer than 200,000 bases each alignment > has a > header entitled "Features in this part of the subject sequence" > listing CDS features on the database sequence within the alignment > range or > at the 5' or 3' end if not features are within the range itself. > This gives a quick description of what you are looking at as many long > sequences have a standard defline such as "chromosome 16". > Example: > > http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Get&NCBI_GI=yes&SHOW_OVERVIE > W=on&NEW_FORMATTER=on&RID=1098455471-18771-167762343145.BLASTQ4#514656 > 96 > > > Rewrites/bug fixes: > ------------------- > > 1.) The graphic overview has been rewritten; it now uses an HTML > implementation. > > 2.) Query-anchored views now work with blastx/tblastn/tblastx, they > didn't > before. > > 3.) phi-BLAST patterns are now also shown in the query-anchored view. > -- Jason Stajich Duke University http://www.duke.edu/~jes12 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From limericksean at gmail.com Tue Aug 30 04:55:27 2005 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue Aug 30 04:44:59 2005 Subject: [Bioperl-l] Bio::Search results Message-ID: <4627846405083001556a5d030e@mail.gmail.com> Hi, The following code snippet is something I use to extract information from hmmer result files: use Bio::SearchIO; my $in =3D new Bio::SearchIO( -format =3D> 'hmmer', -file =3D> $ARGV[0] ); while(my $result =3D $in->next_result) { print $result->query_name(), "\n",$result->query_description(),"\n"; while (my $hit =3D $result->next_hit) { while(my $hsp =3D $hit->next_domain) { next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i); print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n"; } } } The input file is generated by hmmpfam and is given at the command line. I use it to scan for specific domain names e.g ig, fn3 lrr etc. This code works for the first loop and then ends so I get the name and description (no hsp values as their are none for this result): ENSMUSP00000065602=20 pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059 transcript:ENSMUST00000066517 My question is why does the loop end after one instance. Incidentally the outputted name and description above are the last ones in the hmmer file (maybe the file is read from the back??? - don't know if this means anything). Any thoughts would be appreciated. Thanks, Sean. From bmoore at genetics.utah.edu Tue Aug 30 10:40:52 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Aug 30 10:28:02 2005 Subject: [Bioperl-l] Bio::Search results Message-ID: Sean, Don't see anything obviously wrong. If you want to send your input file, I'll try to recreate the problem. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sean O'Keeffe Sent: Tuesday, August 30, 2005 2:55 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Bio::Search results Hi, The following code snippet is something I use to extract information from hmmer result files: use Bio::SearchIO; my $in =3D new Bio::SearchIO( -format =3D> 'hmmer', -file =3D> $ARGV[0] ); while(my $result =3D $in->next_result) { print $result->query_name(), "\n",$result->query_description(),"\n"; while (my $hit =3D $result->next_hit) { while(my $hsp =3D $hit->next_domain) { next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i); print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n"; } } } The input file is generated by hmmpfam and is given at the command line. I use it to scan for specific domain names e.g ig, fn3 lrr etc. This code works for the first loop and then ends so I get the name and description (no hsp values as their are none for this result): ENSMUSP00000065602=20 pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059 transcript:ENSMUST00000066517 My question is why does the loop end after one instance. Incidentally the outputted name and description above are the last ones in the hmmer file (maybe the file is read from the back??? - don't know if this means anything). Any thoughts would be appreciated. Thanks, Sean. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Aug 30 10:40:48 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Aug 30 10:30:03 2005 Subject: [Bioperl-l] SO for RNA-Binding Protein and RNA motifs Message-ID: <6.2.1.2.2.20050830093049.01ecb4a8@express.cites.uiuc.edu> Just had a few simple questions about sequence ontology. What ontology terms are being used for RNA-binding proteins (like IRE or TRAP) or conserved regulatory RNA motifs such as riboswitches? I was thinking about using TF_binding_site for the former, but is this term mainly for DNA-binding proteins? I found a few terms for conserved elements in SO and SOFA (like attenuators), but other conserved motifs (IRE, so on) seem to be missing. I am scanning bacterial genomes for conserved RNA motifs for a few different RNA binding proteins and riboswitches and I would like to convert this data over to GFF3 to map the positions of the hits to the genomes in question. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From limericksean at gmail.com Tue Aug 30 11:28:11 2005 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue Aug 30 11:19:57 2005 Subject: [Bioperl-l] Bio::Search results In-Reply-To: References: Message-ID: <46278464050830082816e3df97@mail.gmail.com> Hi Barry, thanks for the reply. Below is a snippet of the file (I generated it with hmmpfam using the alignment flag set to -A 0, to remove alignments - this shouldn't affect the parsing of the file) : hmmpfam - search one or more sequences against HMM database HMMER 2.3.2 (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /usr/local/lib/pfam-tm Sequence file: Mus_musculus.NCBIM34.jul.pep.fa-short - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query sequence: ENSMUSP00000089702 Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] // Query sequence: ENSMUSP00000089701 Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] // Query sequence: ENSMUSP00000020094 Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- LRR_1 Leucine Rich Repeat 55.0 3.7e-15 6 LRRNT Leucine rich repeat N-terminal domain 30.1 1.1e-07 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- LRRNT 1/1 117 142 .. 1 34 [] 30.1 1.1e-07 LRR_1 1/6 168 191 .. 1 25 [] 14.7 0.0049 LRR_1 2/6 192 210 .. 1 25 [] 9.1 0.2 LRR_1 3/6 212 237 .. 1 25 [] 8.4 0.26 LRR_1 4/6 238 257 .. 1 25 [] 10.3 0.1 LRR_1 5/6 259 282 .. 1 25 [] 10.1 0.12 LRR_1 6/6 290 314 .. 1 25 [] 2.4 2 // Query sequence: ENSMUSP00000074175 Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- CUB CUB domain 232.5 1.3e-68 2 Trypsin Trypsin 206.0 1.2e-60 1 Sushi Sushi domain (SCR repeat) 88.7 2.6e-25 2 EGF_CA Calcium binding EGF domain 29.4 1.8e-07 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- CUB 1/2 16 137 .. 1 116 [] 70.0 1.1e-19 EGF_CA 1/1 141 188 .. 1 55 [] 29.4 1.8e-07 CUB 2/2 192 301 .. 1 116 [] 162.5 1.5e-47 Sushi 1/2 308 370 .. 1 62 [] 47.5 6.5e-13 Sushi 2/2 375 446 .. 1 62 [] 41.2 5.1e-11 Trypsin 1/1 463 698 .. 1 259 [] 206.0 1.2e-60 // Cheers, Sean. On 8/30/05, Barry Moore wrote: > Sean, > > Don't see anything obviously wrong. If you want to send your input > file, I'll try to recreate the problem. > > Barry > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sean > O'Keeffe > Sent: Tuesday, August 30, 2005 2:55 AM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Bio::Search results > > Hi, > The following code snippet is something I use to extract information > from hmmer result files: > > use Bio::SearchIO; > > my $in =3D new Bio::SearchIO( -format =3D> 'hmmer', -file =3D> $ARGV[0] > ); > while(my $result =3D $in->next_result) { > print $result->query_name(), "\n",$result->query_description(),"\n"; > while (my $hit =3D $result->next_hit) { > while(my $hsp =3D $hit->next_domain) { > next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i); > print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n"; > } > } > } > > The input file is generated by hmmpfam and is given at the command > line. I use it to scan for specific domain names e.g ig, fn3 lrr etc. > This code works for the first loop and then ends so I get the name and > description (no hsp values as their are none for this result): > > ENSMUSP00000065602=20 > pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059 > transcript:ENSMUST00000066517 > > My question is why does the loop end after one instance. Incidentally > the outputted name and description above are the last ones in the > hmmer file (maybe the file is read from the back??? - don't know if this > means anything). > Any thoughts would be appreciated. Thanks, > Sean. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From golharam at umdnj.edu Wed Aug 31 00:32:13 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed Aug 31 01:18:47 2005 Subject: [Bioperl-l] Make test fails Message-ID: <00d601c5ade4$f5e37340$d33d140a@GOLHARMOBILE1> I just updated my copy of bioperl live from cvs and when I do a 'make test', it fails miserably. Here's the relevant output: t/DB.........................FAILED test 24 Failed 1/84 tests, 98.81% okay t/DBCUTG.....................ok 22/24 skipped: tests which require remote servers - set env variable BIO PERLDEBUG to test t/DBFasta....................ok t/DNAMutation................ok t/Domcut.....................ok 22/25 skipped: tests which require remote servers - set env variable BIO PERLDEBUG to test t/ECnumber...................ok t/ELM........................ -------------------- WARNING --------------------- MSG: Bio::Tools::Analysis::Protein::ELM Request Error: 400 (Bad Request) URL must be absolute Client-Date: Wed, 31 Aug 2005 04:23:10 GMT --------------------------------------------------- ok t/embl....................... -------------------- WARNING --------------------- MSG: Bio::PrimarySeq=HASH(0x9f74ff8) is not a SeqI compliant sequence object! --------------------------------------------------- -------------------- WARNING --------------------- MSG: test is not a SeqI compliant sequence object! --------------------------------------------------- ok t/EMBL_DB....................ok t/ESEfinder..................error is 0 ok 10/12 skipped: tests which require remote servers - set env variable BIO PERLDEBUG to test t/FeatureIO.................. -------------------- WARNING --------------------- MSG: '##feature-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##attribute-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##source-ontology' directive handling not yet implemented --------------------------------------------------- ok t/Index...................... -------------------- WARNING --------------------- MSG: overwriting a current value stored for AJ288898 --------------------------------------------------- -------------------- WARNING --------------------- MSG: overwriting a current value stored for AI129902 --------------------------------------------------- -------------------- WARNING --------------------- MSG: overwriting a current value stored for BAB68554 --------------------------------------------------- ok t/protgraph..................doing subgraphs |||||||in subgraph - size30 in subgraph - size33 in subgraph - size3 in subgraph - size3 in subgraph - size5 Can't call method "object_id" on unblessed reference at t/protgraph.t line 81, < GEN1> line 82. dubious Test returned status 25 (wstat 6400, 0x1900) after all the subtests completed successfully t/Spidey.....................Global symbol "$exon_num" requires explicit package name at /tmp/bioperl-live/blib/lib/Bio/Tools/Spidey/Results.pm line 309. Global symbol "$gen_start" requires explicit package name at /tmp/bioperl-live/b lib/lib/Bio/Tools/Spidey/Results.pm line 309. Global symbol "$gen_stop" requires explicit package name at /tmp/bioperl-live/bl ib/lib/Bio/Tools/Spidey/Results.pm line 309. Global symbol "$cdna_start" requires explicit package name at /tmp/bioperl-live/ blib/lib/Bio/Tools/Spidey/Results.pm line 309. Global symbol "$cdna_stop" requires explicit package name at /tmp/bioperl-live/b lib/lib/Bio/Tools/Spidey/Results.pm line 309. Any idea why I'm getting these errors? Should I blow away my bioperl-live directory and checkout a whole new version? My spidey modules shouldn't be failing...my last update to it was working fine... Ryan From lupey+ at pitt.edu Tue Aug 30 20:34:39 2005 From: lupey+ at pitt.edu (Paul G Cantalupo) Date: Wed Aug 31 08:20:46 2005 Subject: [Bioperl-l] get_sequence - acc does not exist Message-ID: Hello, I discovered that Bio::Perl get_sequence does not handle Genbank GI numbers properly due to the following code in get_sequence: if( $identifier =~ /^\w+\d+$/ ) { $seq = $db->get_Seq_by_acc($identifier); } else { $seq = $db->get_Seq_by_id($identifier); } Genbank GI numbers (i.e. 51527264) match the regular expression /^\w+\d+$/ therefore unsuprisingly the method get_Seq_by_acc fails (with a warning like: MSG: acc (gb|51527264) does not exist). Instead, the method get_Seq_by_id works when called with GI numbers: use Bio::DB::GenBank; my $genbank_db = Bio::DB::GenBank->new(); $seq = $genbank_db->get_Seq_by_id(51527264); print $seq->desc; Shouldn't the regular expression in get_sequence be changed to look for identifiers that are all digits and then call get_Seq_by_id? Or am I not understanding something? Thank you, Paul Paul Cantalupo Research Specialist/Systems Programmer 559 Crawford Hall Department of Biological Sciences University of Pittsburgh Pittsburgh, PA 15260 Work: 412-624-4687 Fax: 412-624-4759 Ask me about Toastmasters: www.toastmasters.org Midday Club Treasurer From rvosa at sfu.ca Wed Aug 31 08:10:36 2005 From: rvosa at sfu.ca (Rutger Vos) Date: Wed Aug 31 08:40:29 2005 Subject: [Bioperl-l] interoperability with bioperl Message-ID: <43159E3C.9080907@sfu.ca> Dear BioPerlers, I am the author of a phylogenetics oriented package on CPAN called Bio::Phylo (link in my sig). The tree object is superficially similar to something that implements Bio::Tree::TreeI, and so I'm looking for a way to implement interoperability - at least for that object - with BioPerl. Bio::Phylo is more aimed at phylogeneticists, who might not be as interested in installing the BioPerl core (and sort out the dependencies and so on), so I am not looking to integrate in BioPerl. I am now leaning towards implementing interoperability in the following way: i) I create a separate CPAN package, with BioPerl & Bio::Phylo dependencies (so that my own "core" doesn't require bioperl). ii) this package inherits (using "use base" & "use fields" from Bio::Phylo::Trees::Tree, see http://search.cpan.org/~rvosa/Bio-Phylo-0.04/lib/Bio/Phylo/Trees/Tree.pm). iii) this package @ISA Bio::Tree::TreeI (see http://search.cpan.org/~birney/bioperl-1.4/Bio/Tree/TreeI.pm). iv) hence, through multiple inheritance, a "hybrid" tree object is created, which implements both APIs (there's a fair amount of overlap, I might get away with just symbol table manipulation (globs) to implement Bio::Tree::TreeI). I solicit your thoughts on whether you think this is the right way to go about things. My main worry is that there'll be problems if people have taken to sticking their fingers inside Bio::Tree::TreeI-like objects to fondle their attributes directly. Then again, that "voids their warranty", perhaps. Best wishes, Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From birney at ebi.ac.uk Wed Aug 31 08:50:59 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Wed Aug 31 08:48:35 2005 Subject: [Bioperl-l] get_sequence - acc does not exist In-Reply-To: References: Message-ID: <4315A7B3.5080504@ebi.ac.uk> Paul G Cantalupo wrote: > Hello, > > I discovered that Bio::Perl get_sequence does not handle Genbank GI > numbers properly due to the following code in get_sequence: > > if( $identifier =~ /^\w+\d+$/ ) { > $seq = $db->get_Seq_by_acc($identifier); > } else { > $seq = $db->get_Seq_by_id($identifier); > } > > Genbank GI numbers (i.e. 51527264) match the regular expression > /^\w+\d+$/ therefore unsuprisingly the method get_Seq_by_acc fails (with > a warning like: MSG: acc (gb|51527264) does not exist). Instead, the > method get_Seq_by_id works when called with GI numbers: > > > use Bio::DB::GenBank; > my $genbank_db = Bio::DB::GenBank->new(); > $seq = $genbank_db->get_Seq_by_id(51527264); > print $seq->desc; > > Shouldn't the regular expression in get_sequence be changed to look for > identifiers that are all digits and then call get_Seq_by_id? Or am I not > understanding something? > traditionally "GI" numbers are _not_ accession numbers: GI numbers are internal numbers given out by NCBI for sequences in-house. However, this is all about heuristics guessing the right thing, and probably the right thing to do is try the get_Seq_by_acc, and then if this is undef, try get_Seq_by_id > Thank you, > > Paul > > Paul Cantalupo > Research Specialist/Systems Programmer > 559 Crawford Hall > Department of Biological Sciences > University of Pittsburgh > Pittsburgh, PA 15260 > Work: 412-624-4687 > Fax: 412-624-4759 > > Ask me about Toastmasters: www.toastmasters.org > Midday Club Treasurer > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Wed Aug 31 08:24:45 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 31 09:17:17 2005 Subject: [Bioperl-l] Make test fails In-Reply-To: <00d601c5ade4$f5e37340$d33d140a@GOLHARMOBILE1> References: <00d601c5ade4$f5e37340$d33d140a@GOLHARMOBILE1> Message-ID: <4A14F04A-DCA9-40CD-BEDE-183A008291B4@duke.edu> I don't know that this is miserable, how many passed? ;) I think you might want to get a fresh copy of spidey.pm and re-try that test, it works fine on my machine, maybe you have local changes that got merged during the CVS update - if there are ">>>" lines in your spidey.pm code it could be causing problems. I see the protgraph failure too depending on which version of Graph::Directed I have installed. The t/embl.t warning is intended, although we may want to silence it unless BIOPERLDEBUG environment variable is set. The rest of the failures are wrt remote websites which we don't have control over, so things seem to drift. I see the t/ELM.t failure too, need to see if Richard or someone can take a looksie. The DB.t tests are all failing, I don't know what is the problem with the website, but I think we'll definitely disable them without BIOPERLDEBUG set. -jason On Aug 31, 2005, at 12:32 AM, Ryan Golhar wrote: > I just updated my copy of bioperl live from cvs and when I do a 'make > test', it fails miserably. Here's the relevant output: > > t/DB.........................FAILED test 24 > Failed 1/84 tests, 98.81% okay > t/DBCUTG.....................ok > 22/24 skipped: tests which require remote servers - set env > variable BIO > PERLDEBUG to test > t/DBFasta....................ok > t/DNAMutation................ok > t/Domcut.....................ok > 22/25 skipped: tests which require remote servers - set env > variable BIO > PERLDEBUG to test > t/ECnumber...................ok > t/ELM........................ > -------------------- WARNING --------------------- > MSG: Bio::Tools::Analysis::Protein::ELM Request Error: > 400 (Bad Request) URL must be absolute > Client-Date: Wed, 31 Aug 2005 04:23:10 GMT > > > > --------------------------------------------------- > ok > t/embl....................... > -------------------- WARNING --------------------- > MSG: Bio::PrimarySeq=HASH(0x9f74ff8) is not a SeqI compliant sequence > object! > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: test is not a SeqI compliant sequence object! > --------------------------------------------------- > ok > t/EMBL_DB....................ok > t/ESEfinder..................error is 0 > ok > 10/12 skipped: tests which require remote servers - set env > variable BIO > PERLDEBUG to test > t/FeatureIO.................. > -------------------- WARNING --------------------- > MSG: '##feature-ontology' directive handling not yet implemented > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: '##attribute-ontology' directive handling not yet implemented > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: '##source-ontology' directive handling not yet implemented > --------------------------------------------------- > ok > t/Index...................... > -------------------- WARNING --------------------- > MSG: overwriting a current value stored for AJ288898 > > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: overwriting a current value stored for AI129902 > > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: overwriting a current value stored for BAB68554 > > --------------------------------------------------- > ok > t/protgraph..................doing subgraphs > |||||||in subgraph - size30 > in subgraph - size33 > in subgraph - size3 > in subgraph - size3 > in subgraph - size5 > Can't call method "object_id" on unblessed reference at t/protgraph.t > line 81, < > GEN1> line 82. > dubious > Test returned status 25 (wstat 6400, 0x1900) > after all the subtests completed successfully > t/Spidey.....................Global symbol "$exon_num" requires > explicit > package > name at /tmp/bioperl-live/blib/lib/Bio/Tools/Spidey/Results.pm line > 309. > Global symbol "$gen_start" requires explicit package name at > /tmp/bioperl-live/b > lib/lib/Bio/Tools/Spidey/Results.pm line 309. > Global symbol "$gen_stop" requires explicit package name at > /tmp/bioperl-live/bl > ib/lib/Bio/Tools/Spidey/Results.pm line 309. > Global symbol "$cdna_start" requires explicit package name at > /tmp/bioperl-live/ > blib/lib/Bio/Tools/Spidey/Results.pm line 309. > Global symbol "$cdna_stop" requires explicit package name at > /tmp/bioperl-live/b > lib/lib/Bio/Tools/Spidey/Results.pm line 309. > > > Any idea why I'm getting these errors? Should I blow away my > bioperl-live directory and checkout a whole new version? > > My spidey modules shouldn't be failing...my last update to it was > working fine... > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From birney at ebi.ac.uk Wed Aug 31 09:13:46 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Wed Aug 31 09:18:15 2005 Subject: [Bioperl-l] interoperability with bioperl In-Reply-To: <43159E3C.9080907@sfu.ca> References: <43159E3C.9080907@sfu.ca> Message-ID: <4315AD0A.3060700@ebi.ac.uk> Rutger Vos wrote: > Dear BioPerlers, > > I am the author of a phylogenetics oriented package on CPAN called > Bio::Phylo (link in my sig). The tree object is superficially similar to > something that implements Bio::Tree::TreeI, and so I'm looking for a way > to implement interoperability - at least for that object - with BioPerl. > Bio::Phylo is more aimed at phylogeneticists, who might not be as > interested in installing the BioPerl core (and sort out the dependencies > and so on), so I am not looking to integrate in BioPerl. > > I am now leaning towards implementing interoperability in the following > way: > > i) I create a separate CPAN package, with BioPerl & Bio::Phylo > dependencies (so that my own "core" doesn't require bioperl). > > ii) this package inherits (using "use base" & "use fields" from > Bio::Phylo::Trees::Tree, see > http://search.cpan.org/~rvosa/Bio-Phylo-0.04/lib/Bio/Phylo/Trees/Tree.pm). > > iii) this package @ISA Bio::Tree::TreeI (see > http://search.cpan.org/~birney/bioperl-1.4/Bio/Tree/TreeI.pm). > > iv) hence, through multiple inheritance, a "hybrid" tree object is > created, which implements both APIs (there's a fair amount of overlap, I > might get away with just symbol table manipulation (globs) to implement > Bio::Tree::TreeI). > > I solicit your thoughts on whether you think this is the right way to go > about things. My main worry is that there'll be problems if people have > taken to sticking their fingers inside Bio::Tree::TreeI-like objects to > fondle their attributes directly. Then again, that "voids their > warranty", perhaps. > This is precisely the way to go and I am planning on a similar approach to "bridge" between Ensembl and Bioperl - ie, make wrapper classes that holds onto the Ensembl object, and delegates the necessary I defined functions for Bioperl "Clients". As you said, if I client starts looking inside the object directly then its on its own head. > Best wishes, > > Rutger > From slenk at emich.edu Wed Aug 31 10:12:24 2005 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed Aug 31 11:02:34 2005 Subject: [Bioperl-l] Protein alignment CD excision module Message-ID: <1a085e81a039a6.1a039a61a085e8@emich.edu> I am converting a module that takes a ClustalW alignment, data mines the conserved domains from NCBI, then selectively replaces the CDs with IUPAC 'X' and writes a ClustalW file back out. We have several uses for this module's functions. I am converting this to be a Bioperl module to take advantage of AlignIO capabilities to read/write multiple alignment file types. There is a .pm package excise_cd.pm, which I have placed in Align (along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not yet written an I file for it, but recognise the necessity of doing so for optimum compatability with Bioperl. Only one method from excise_cd is used outside the module - excise(), which takes a SimpleAlign object made with AlignIO in the calling program and a hash function with options. The excise method extracts the sequence data from the SimpleAlign object, data mines the CD information and uses the options to guide the overwriting of residues with 'X'. excise() (will) then create an AlignIO output object of the requested format with the excised alignment. This is then returned to the caller, which can write out the excised alignment in the desired format. I think of this from an external perspective as a CD excising (Xing out) and data converting filter for alignment files. Is this a reasonable approach? Would this be an appropriate module and script for me to donate to Bioperl when properly done? Another question - I data mine from NCBI using only gi identifiers for the proteins. I have writen my own code to do this. Is there a Bioperl way to do get CD data for a protein and can this way allow me to obtain CD regions for PFAM or other identifiers as well? Thanks, Steve Lenk slenk@emich.edu From heikki at ebi.ac.uk Wed Aug 31 12:36:14 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Aug 31 12:33:30 2005 Subject: [Bioperl-l] Protein alignment CD excision module In-Reply-To: <1a085e81a039a6.1a039a61a085e8@emich.edu> References: <1a085e81a039a6.1a039a61a085e8@emich.edu> Message-ID: <200508311736.15001.heikki@ebi.ac.uk> Steve, I can see the usefulness of what you are doing, but bioperl is a library and needs to think modularly so that other users can easily modify it. What you are describing is a best implemented as a script that uses several modules. That example script could be stored in BioPerl separately. On Wednesday 31 August 2005 15:12, Stephen Gordon Lenk wrote: > I am converting a module that takes a ClustalW alignment, data mines > the conserved domains from NCBI, then selectively replaces the CDs > with IUPAC 'X' and writes a ClustalW file back out. We have several > uses for this module's functions. Reading and writing an alignment is already handled by Bio::AlignIO. If you hardcode the format in a module, you loose flexibility. So this belongs to a script. "data mines the conserved domains from NCBI" This needs to be done separately by writing, e.g., a Bio::DB or a Bio::Tools::Analysis module for accessing the data. Then you need a storage object to store the conserved residues. You could use Bio::Seq::Meta derived objects to do that or store them as sequence feaures Bio::SeqFeature::Generic - or roll your own. The main question is that do you need to store residue-based information or a few large regions. "then selectively replaces the CDs with IUPAC 'X'" This could be implemented as a method that takes the alignment and the storage object(s) from your analysis and returns the new alignment. Bio::Align::Utilities could store that. > I am converting this to be a Bioperl module to take advantage of > AlignIO capabilities to read/write multiple alignment file types. Good idea. > There is a .pm package excise_cd.pm, which I have placed in Align > (along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not clustalw.pm is in Bio::AlignIO. Only modules that are subclasses of Bio::AlignIO should go there. > yet written an I file for it, but recognise the necessity of doing so > for optimum compatability with Bioperl. An I file is needed only if you expect that there will be several implementations of the interface. > Only one method from excise_cd is used outside the module - excise(), > which takes a SimpleAlign object made with AlignIO in the calling > program and a hash function with options. The excise method extracts For modularity, that hash storing all the options, need to turned into reusable objects. > the sequence data from the SimpleAlign object, data mines the CD > information and uses the options to guide the overwriting of residues > with 'X'. excise() (will) then create an AlignIO output object of the > requested format with the excised alignment. This is then returned to > the caller, which can write out the excised alignment in the desired > format. > I think of this from an external perspective as a CD excising (Xing > out) and data converting filter for alignment files. >From your earlier description CD finding was the problem. Bio::SimpleAlign::slice do the slicing. On the other hand, from the description, I am not sure it is necessary to work with the alignment as a whole: It might be that it is best to treat each sequence separately. Of course, that depends on reliability of the alignment and what you have actually aligned! > Is this a reasonable approach? > Would this be an appropriate module and > script for me to donate to Bioperl when properly done? Yes, please. -Heikki > Another question - I data mine from NCBI using only gi identifiers for > the proteins. I have writen my own code to do this. Is there a Bioperl > way to do get CD data for a protein and can this way allow me to > obtain CD regions for PFAM or other identifiers as well? > > Thanks, > Steve Lenk > slenk@emich.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason.stajich at duke.edu Wed Aug 31 13:34:44 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Aug 31 13:23:49 2005 Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output In-Reply-To: <200508311704.j7VH4DUV016842@ayrton.acpub.duke.edu> References: <200508311704.j7VH4DUV016842@ayrton.acpub.duke.edu> Message-ID: http://fungal.genome.duke.edu/~jes12/software/scripts/ process_exonerate_gff3.pl You may still want to massage it some, but I use the script in this basic form, maybe with a few tweaks: Note that it requires you to run exonerate with specific --ryo options so that it includes the length of the query and hit sequences in the report output. should be covered in the perldoc in the script. Without the ryo options enabled, you'll need to modify the script more to have access to the original sequence db, use Bio::DB::Fasta, and put in some $dbh->length($seqid) calls instead. I don't think the part which writes HSP/match lines is actually correct - it is trying to roll gapped HSPs from the similarity features. I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse instance and/or grepping out the lines I really think I need. You may want to s/exon/CDS/ for the protein2genome output as well. -jason On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote: > Jason, > > This message is in regards to an old thread in which you offered > to shared a 'script for munging over' exonerate output for lading > in DB::GFF (c.f. http://bioperl.org/pipermail/bioperl-l/2005-April/ > 018741.html) > > Would you be willing to still share that script, if you've got it > around? > > Thanks, and regards, > > Malcolm Cook - mec@stowers-institute.org - 816-926-4449 > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, MO USA > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From MEC at Stowers-Institute.org Wed Aug 31 13:04:10 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Wed Aug 31 13:26:33 2005 Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output Message-ID: <200508311726.j7VHQIAH032054@portal.open-bio.org> Jason, This message is in regards to an old thread in which you offered to shared a 'script for munging over' exonerate output for lading in DB::GFF (c.f. http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html) Would you be willing to still share that script, if you've got it around? Thanks, and regards, Malcolm Cook - mec@stowers-institute.org - 816-926-4449 Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, MO USA From slenk at emich.edu Wed Aug 31 14:16:21 2005 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed Aug 31 14:05:37 2005 Subject: [Bioperl-l] Alignment excision script Message-ID: <1aa8c0a1aa81e2.1aa81e21aa8c0a@emich.edu> Heikki, Thank you! Following is thinking out loud. I will accept your advice and reconvert to 100% script, easy. No new object type will be created. Actually, I already have a seperate script with AlignIO objects, just did not explain well. - The script will create AlignIO objects with format defined by user (or have Bioperl guess format if user does not specify ...). All IO will be done using them. Flexible, 'any' format in, 'any' format out with CD excised via X. We will use this in our analysis pipeline. - The alignment must be treated as a whole as the default 'X'ing out (partial excision) considers if a whole column is part of a CD. I first X out designated CD residues, then look to see if the whole column is X'd out before making the final excision on a copy of the original sequences. I can return eiher a full (all designated CD) or partial (only columns that all have X). I have this code solid, and plan to use it internally to script. Reuse what works well already. - I have extracted needed information from the input AlignIO object already and process it using the above method. The internal excised alignment data is right. Just a matter of loading it into the output AlignIO object. - I can use AlignIO methods to add excised sequences etc to output object formatted as requested by user, sounds easy. Will look at Utilities for any shortcuts. - I will further examine the Bio::Tools::Analysis for Bioperl methods to get the needed CD data, which is really just start/end pairs for a given protein sequence. Nothing fancy needed as far as representation for the already working code. All I use is "$start $end" to represent excision regions for given CD for given protein sequence. I make an array of these for a given protein and use that when I do the initial Xing out. I'd like to have internal reuse of existing reliable code. - I have a t/ directory for the earlier script. I will expand and reuse this. POD documentation is in the code. I will modify it to reflect current status. Again, thank you. Steve Lenk slenk@emich.edu From slenk at emich.edu Wed Aug 31 19:34:54 2005 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed Aug 31 20:24:12 2005 Subject: [Bioperl-l] Bioperl adds utility to msaexcise script Message-ID: <1b311c21b2a961.1b2a9611b311c2@emich.edu> I adopted Heikki Lehvaslaiho's ideas. The script now reads/writes multiple formats based on users request on command line. Thanks Bioperl developers! Snippet below shows use of AlignIO. I'll work on better/more flexible data mining for CD regions next. I'd like to be able to use multiple types of protein id's as are in user's alignment and get CD for it. eval { ############## # input stream ############## use Bio::AlignIO; my $in = Bio::AlignIO->new( -fh => \*STDIN, -format => $informat ) -> next_aln(); ########## # excision ########## my $out = _excise( $max_e_value, $use_all_cds, $full_excise, \@excise_cd, $in ); ############### # output stream ############### Bio::AlignIO->new( -format => $outformat, -fh => \*STDOUT ) -> write_aln($out); }; Thanks, Steve Lenk slenk@emich.edu