From houcine at seznam.cz  Mon Nov  1 06:46:22 2004
From: houcine at seznam.cz (prudence)
Date: Mon Nov  1 07:08:09 2004
Subject: [BioPython] =?windows-1251?b?//Dq6OUg6vDg8eroIOvl8uA=?=
Message-ID: <200411011207.iA1C7IKr002667@portal.open-bio.org>

?? ????? ?????, ? ?? ???? ?????. ????? ????????? ??????????? ????? ?? www.elitpresent.ru ???????? ??? ??????? ??? ?????? ???? ? ??????? ??????? ???????? ?????...


 
From biopython-dev-bounces at portal.open-bio.org  Mon Nov  1 07:55:10 2004
From: biopython-dev-bounces at portal.open-bio.org (biopython-dev-bounces@portal.open-bio.org)
Date: Mon Nov  1 07:55:16 2004
Subject: [BioPython] Your message to Biopython-dev awaits moderator approval
Message-ID: <mailman.47436.1099313710.21032.biopython-dev@biopython.org>

Your mail to 'Biopython-dev' with the subject

    Re: Hello

Is being held until the list moderator can review it for approval.

The reason it is being held:

    Message has a suspicious header

Either the message will get posted to the list, or you will receive
notification of the moderator's decision.  If you would like to cancel
this posting, please visit the following URL:

    http://biopython.org/mailman/confirm/biopython-dev/7b085f2054325221188ee9576a876bf6f5d8676c

From herve at norika-fujiwara.com  Mon Nov  1 15:16:34 2004
From: herve at norika-fujiwara.com (=?Windows-1251?B?0OXq6+Ds7e7lIO/w5eTr7ubl7ejl?=)
Date: Mon Nov  1 15:21:36 2004
Subject: [BioPython] =?windows-1251?b?0OXq6+Ds4DogxOjn4OntIOTr/yDC4PEh?=
	=?windows-1251?q?!!?=
Message-ID: <1329137624.20041101202134@>

????????? ?????! ???????!

???????? ????? ???????? ??? ???????????, ?????? ?/??? ??????!
?? 300 ?.? ?? ...
? ?????????? ????????? ??????!
?? 2800 ?.? ? ??????/?????/???????/???/???????????/???????? :)))
(????????? ????:))

?????????? ????????, ?????????? ????? - ?? 450 ?.?. ?? ...
?? "???????????" (25 ?.?.) ?????? ????????? ????????? ?? 2500 ?.?. ?? ?????????? ????????? ????????, ???????, ????? - ????.

??????-???? (????????? ???????? ?? ???? ?? ??????) ?? 500 ?.?.

???????? ? ??????? ???????????? ???????????????, Corbis, Fotobank ? ???????.
?? 70 ?.?.

?? ?????????? ? ???????????? ???? - ???????!
?? 700 ?.?. ?? ?????????? ????????!

??????????? ?????? ????? ? ????? ?????????????!
?????????? ??????? ? ??????, ?? ????? ? ??! ???????? ???????!

??????????, ??????-????????!!!
??????????? ?????????!

??????????? ? ?????????? ????????? ????????!

????????? ??? ????? ??????!

???.:  (095) 101-3527

??????????, ?? ????????? ????? ?? ???????? ?????!


?? ????? ??????? ????????? ???????? ???????????????.
??????? ?? ?????????? ?????? ? ?????? ????? ????? ? ????? "??" - ????????? ???????????,
? ????? "????" - ???????: (???????? ??????????? ????, ????????, ??????, ?????? ? ??.)

???? ?? ?? ??????? ???????? ????????? ??????????? ?? ??????????? ?????,
?? ?????? ??????? ???? ????? ?? ????? ???? ??????, ??????? ??? ?????? ? ????? ??????? ???????? 
? ???? ??????, ?? ??????: nevajno2000@yahoo.com
?? ?????? ???? ???????, ??? ?? ?? ????? ???? ?????? ??? ???????? ????? ?? ????? ???? ??????.
?????? ???? ???????????? 20 ????? ??????? ??????.


 
From mike at maibaum.org  Tue Nov  2 12:56:38 2004
From: mike at maibaum.org (Michael Maibaum)
Date: Tue Nov  2 12:58:07 2004
Subject: [BioPython] GenBank parsing errors
Message-ID: <8B56F325-2CF8-11D9-AB74-000A95BA5F0A@maibaum.org>


Hi,

I'm trying to use biopython to parse genbank files and it is working  
happily on some genbank files,  but not many others. So far the pattern  
appears to be

Prokaryotic complete genome => OK
Eukaryotic complete genome =>failure.

The failures are typically very early in the file and don't have  
wonderfully useful information in the traceback. It falls over in the  
Martel Parser giving the error


Martel.Parser.ParserPositionException: error parsing at or beyond  
character 191. As this genome is a bit large to attatch I've just  
included the +/- 10 lines around 191

The full file, should you want it is at:
<ftp://ftp.ensembl.org/pub/current_tetraodon/data/flatfiles/genbank/ 
Tetraodon_nigroviridis.0.dat.gz>

Does anyone have any ideas why this is failing, is it just the joy of  
tracking NCBI record formats and I need to start looking at the  
internals for a fix (or use something else) or?

thanks

Michael

-- 
Dr Michael Maibaum
Department of Biochemistry and Molecular Biology, UCL
email: maibaum@biochemistry.ucl.ac.uk

From mike at maibaum.org  Tue Nov  2 13:05:30 2004
From: mike at maibaum.org (Michael Maibaum)
Date: Tue Nov  2 13:07:02 2004
Subject: [BioPython] GenBank parsing errors
In-Reply-To: <8B56F325-2CF8-11D9-AB74-000A95BA5F0A@maibaum.org>
References: <8B56F325-2CF8-11D9-AB74-000A95BA5F0A@maibaum.org>
Message-ID: <C85A6DB2-2CF9-11D9-AB74-000A95BA5F0A@maibaum.org>


On 2 Nov 2004, at 17:56, Michael Maibaum wrote:

>
> Hi,
>
> I'm trying to use biopython to parse genbank files and it is working  
> happily on some genbank files,  but not many others. So far the  
> pattern appears to be
>
> Prokaryotic complete genome => OK
> Eukaryotic complete genome =>failure.
>
> The failures are typically very early in the file and don't have  
> wonderfully useful information in the traceback. It falls over in the  
> Martel Parser giving the error
>
>
> Martel.Parser.ParserPositionException: error parsing at or beyond  
> character 191. As this genome is a bit large to attatch I've just  
> included the +/- 10 lines around 191
>
> The full file, should you want it is at:
> <ftp://ftp.ensembl.org/pub/current_tetraodon/data/flatfiles/genbank/ 
> Tetraodon_nigroviridis.0.dat.gz>
>
> Does anyone have any ideas why this is failing, is it just the joy of  
> tracking NCBI record formats and I need to start looking at the  
> internals for a fix (or use something else) or?
>


hmmph,

forgot some probably pertinent details....

biopython 1.30

python 2.3

Mac OS X 10.3.5

From djkojeti at unity.ncsu.edu  Tue Nov  2 16:14:47 2004
From: djkojeti at unity.ncsu.edu (Douglas Kojetin)
Date: Tue Nov  2 16:13:27 2004
Subject: [BioPython] PDB -> FASTA
Message-ID: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>

Hi All-

I'm a beginner @ biopython (and I'm 'switching' from perl to python 
...).  First off, many thanks for the structrual biopython FAQ ... very 
helpful!  My question:  can anyone help me with some ideas on how to 
whip up a quick PDB->FASTA (sequence) script?

 From the structural biopython faq, I've been able to extract residue 
information in the form of:

  <Residue MET het=  resseq=1 icode= >

I take it I would just need to grab the MET (split the residue object 
and grab the r[1] index?) and convert into M, then append to a sequence 
string ....

but I didn't know if biopython had something that did an autoconversion 
of MET->M, or vice versa (M->MET).

Thanks for the input,
Doug

From idoerg at burnham.org  Tue Nov  2 16:51:05 2004
From: idoerg at burnham.org (Iddo)
Date: Tue Nov  2 16:59:40 2004
Subject: [BioPython] PDB -> FASTA
In-Reply-To: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>
References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>
Message-ID: <41880149.5060004@burnham.org>

Welcome aboard, and I am glad we managed to save one  Jedi from the dark 
side of  bioinformatics...;)

As to your question:  see the attached class. I should get that in 
Biopython, but I keep not doing that....

./I


Douglas Kojetin wrote:

> Hi All-
>
> I'm a beginner @ biopython (and I'm 'switching' from perl to python 
> ...).  First off, many thanks for the structrual biopython FAQ ... 
> very helpful!  My question:  can anyone help me with some ideas on how 
> to whip up a quick PDB->FASTA (sequence) script?
>
> From the structural biopython faq, I've been able to extract residue 
> information in the form of:
>
>  <Residue MET het=  resseq=1 icode= >
>
> I take it I would just need to grab the MET (split the residue object 
> and grab the r[1] index?) and convert into M, then append to a 
> sequence string ....
>
> but I didn't know if biopython had something that did an 
> autoconversion of MET->M, or vice versa (M->MET).
>
> Thanks for the input,
> Doug
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaCode.py
Type: text/x-python
Size: 2069 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20041102/83a76eb2/aaCode.py
From idoerg at burnham.org  Tue Nov  2 18:34:20 2004
From: idoerg at burnham.org (Iddo)
Date: Tue Nov  2 18:33:05 2004
Subject: PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter
	hates me)
In-Reply-To: <D3EBCB29-2D23-11D9-9AD9-000D93AE5B7A@compbio.berkeley.edu>
References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>
	<41880149.5060004@burnham.org>
	<D3EBCB29-2D23-11D9-9AD9-000D93AE5B7A@compbio.berkeley.edu>
Message-ID: <4188197C.8000004@burnham.org>

Trouble is, that the conversion here might not be good for a some 
purposes, as usually structure ->sequence conversion applications want 
(1) unique mapping and (2) a 20 letter alphabet + 'X' for everything else.


Gavin Crooks wrote:

> There is also a longer three letter code conversion table in 
> Bio/SCOP/Raf.py
> The PDB contains a whole bunch of weird 3 letter codes for different
> chemically modified amino acids.
>
> Another possibility is to get the fasta sequences directly from the
> ASTRAL database, since they have already grubbed around and done the
> conversion.
>
> Gavin
>
>
> # This table is taken from the RAF release notes, and includes the
> # undocumented mapping "UNK" -> "X"
> to_one_letter_code= {
>     'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>     'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>     'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>     'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>     '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>     'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>     'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>     'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>     'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>     'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>     'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>     'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>     'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>     'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>     'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>     'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>     'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>     'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>     'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>     'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>     'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>     'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>     'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>     'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>     'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>     'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>     'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>     'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>     'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>     'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>     'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>     'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>     'UNK':'X'
>     }
>
> On Nov 2, 2004, at 13:51, Iddo wrote:
>
>> Welcome aboard, and I am glad we managed to save one  Jedi from the 
>> dark side of  bioinformatics...;)
>>
>> As to your question:  see the attached class. I should get that in 
>> Biopython, but I keep not doing that....
>>
>> ./I
>>
>>
>> Douglas Kojetin wrote:
>>
>>> Hi All-
>>>
>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python 
>>> ...).  First off, many thanks for the structrual biopython FAQ ... 
>>> very helpful!  My question:  can anyone help me with some ideas on 
>>> how to whip up a quick PDB->FASTA (sequence) script?
>>>
>>> From the structural biopython faq, I've been able to extract residue 
>>> information in the form of:
>>>
>>>  <Residue MET het=  resseq=1 icode= >
>>>
>>> I take it I would just need to grab the MET (split the residue 
>>> object and grab the r[1] index?) and convert into M, then append to 
>>> a sequence string ....
>>>
>>> but I didn't know if biopython had something that did an 
>>> autoconversion of MET->M, or vice versa (M->MET).
>>>
>>> Thanks for the input,
>>> Doug
>>>
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython@biopython.org
>>> http://biopython.org/mailman/listinfo/biopython
>>>
>>>
>>
>
> Gavin E. Crooks
> Postdoctoral Fellow                  tel:  (510) 642-9614
> 461 Koshland Hall                    aim:notastring
> University of California             http://threeplusone.com/
> Berkeley, CA 94720-3102, USA         gec@compbio.berkeley.edu
>
>
>


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo

From djkojeti at unity.ncsu.edu  Tue Nov  2 18:46:57 2004
From: djkojeti at unity.ncsu.edu (Douglas Kojetin)
Date: Tue Nov  2 18:45:46 2004
Subject: PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam
	filter hates me)
In-Reply-To: <4188197C.8000004@burnham.org>
References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>
	<41880149.5060004@burnham.org>
	<D3EBCB29-2D23-11D9-9AD9-000D93AE5B7A@compbio.berkeley.edu>
	<4188197C.8000004@burnham.org>
Message-ID: <7BEA0401-2D29-11D9-AD31-000A9597278C@unity.ncsu.edu>

Thanks for all the quick and detailed responses!  Quite a motivation to 
stick with (Bio)Python!

I was unaware of the ASTRAL database -- that might be useful in the 
future (i was looking to do this particular conversion on an 
experimentally determined PDB ... not yet submitted).  Can BioPython 
interact directly w/ the (remote) ASTRAL database?  Or is it something 
I would need to have locally?

Thanks again,
Doug


On Nov 2, 2004, at 6:34 PM, Iddo wrote:

> Trouble is, that the conversion here might not be good for a some 
> purposes, as usually structure ->sequence conversion applications want 
> (1) unique mapping and (2) a 20 letter alphabet + 'X' for everything 
> else.
>
>
> Gavin Crooks wrote:
>
>> There is also a longer three letter code conversion table in 
>> Bio/SCOP/Raf.py
>> The PDB contains a whole bunch of weird 3 letter codes for different
>> chemically modified amino acids.
>>
>> Another possibility is to get the fasta sequences directly from the
>> ASTRAL database, since they have already grubbed around and done the
>> conversion.
>>
>> Gavin
>>
>>
>> # This table is taken from the RAF release notes, and includes the
>> # undocumented mapping "UNK" -> "X"
>> to_one_letter_code= {
>>     'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>>     'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>>     'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>>     'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>>     '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>>     'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>>     'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>>     'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>>     'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>>     'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>>     'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>>     'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>>     'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>>     'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>>     'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>>     'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>>     'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>>     'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>>     'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>>     'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>>     'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>>     'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>>     'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>>     'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>>     'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>>     'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>>     'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>>     'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>>     'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>>     'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>>     'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>>     'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>>     'UNK':'X'
>>     }
>>
>> On Nov 2, 2004, at 13:51, Iddo wrote:
>>
>>> Welcome aboard, and I am glad we managed to save one  Jedi from the 
>>> dark side of  bioinformatics...;)
>>>
>>> As to your question:  see the attached class. I should get that in 
>>> Biopython, but I keep not doing that....
>>>
>>> ./I
>>>
>>>
>>> Douglas Kojetin wrote:
>>>
>>>> Hi All-
>>>>
>>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python 
>>>> ...).  First off, many thanks for the structrual biopython FAQ ... 
>>>> very helpful!  My question:  can anyone help me with some ideas on 
>>>> how to whip up a quick PDB->FASTA (sequence) script?
>>>>
>>>> From the structural biopython faq, I've been able to extract 
>>>> residue information in the form of:
>>>>
>>>>  <Residue MET het=  resseq=1 icode= >
>>>>
>>>> I take it I would just need to grab the MET (split the residue 
>>>> object and grab the r[1] index?) and convert into M, then append to 
>>>> a sequence string ....
>>>>
>>>> but I didn't know if biopython had something that did an 
>>>> autoconversion of MET->M, or vice versa (M->MET).
>>>>
>>>> Thanks for the input,
>>>> Doug
>>>>
>>>> _______________________________________________
>>>> BioPython mailing list  -  BioPython@biopython.org
>>>> http://biopython.org/mailman/listinfo/biopython
>>>>
>>>>
>>>
>>
>> Gavin E. Crooks
>> Postdoctoral Fellow                  tel:  (510) 642-9614
>> 461 Koshland Hall                    aim:notastring
>> University of California             http://threeplusone.com/
>> Berkeley, CA 94720-3102, USA         gec@compbio.berkeley.edu
>>
>>
>>
>
>
> -- 
> Iddo Friedberg, Ph.D.
> The Burnham Institute
> 10901 North Torrey Pines Road
> La Jolla, CA 92037 USA
> T: (858) 646 3100 x3516
> F: (858) 713 9930
> http://ffas.ljcrf.edu/~iddo
>

From idoerg at burnham.org  Tue Nov  2 19:36:50 2004
From: idoerg at burnham.org (Iddo)
Date: Tue Nov  2 19:35:23 2004
Subject: PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam	filter
	hates me)
In-Reply-To: <7BEA0401-2D29-11D9-AD31-000A9597278C@unity.ncsu.edu>
References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>	<41880149.5060004@burnham.org>	<D3EBCB29-2D23-11D9-9AD9-000D93AE5B7A@compbio.berkeley.edu>	<4188197C.8000004@burnham.org>
	<7BEA0401-2D29-11D9-AD31-000A9597278C@unity.ncsu.edu>
Message-ID: <41882822.4010905@burnham.org>

I believe it is for local Astral, not for over-the-web.

./I


Douglas Kojetin wrote:

> Thanks for all the quick and detailed responses!  Quite a motivation 
> to stick with (Bio)Python!
>
> I was unaware of the ASTRAL database -- that might be useful in the 
> future (i was looking to do this particular conversion on an 
> experimentally determined PDB ... not yet submitted).  Can BioPython 
> interact directly w/ the (remote) ASTRAL database?  Or is it something 
> I would need to have locally?
>
> Thanks again,
> Doug
>
>
> On Nov 2, 2004, at 6:34 PM, Iddo wrote:
>
>> Trouble is, that the conversion here might not be good for a some 
>> purposes, as usually structure ->sequence conversion applications 
>> want (1) unique mapping and (2) a 20 letter alphabet + 'X' for 
>> everything else.
>>
>>
>> Gavin Crooks wrote:
>>
>>> There is also a longer three letter code conversion table in 
>>> Bio/SCOP/Raf.py
>>> The PDB contains a whole bunch of weird 3 letter codes for different
>>> chemically modified amino acids.
>>>
>>> Another possibility is to get the fasta sequences directly from the
>>> ASTRAL database, since they have already grubbed around and done the
>>> conversion.
>>>
>>> Gavin
>>>
>>>
>>> # This table is taken from the RAF release notes, and includes the
>>> # undocumented mapping "UNK" -> "X"
>>> to_one_letter_code= {
>>>     'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>>>     'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>>>     'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>>>     'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>>>     '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>>>     'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>>>     'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>>>     'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>>>     'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>>>     'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>>>     'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>>>     'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>>>     'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>>>     'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>>>     'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>>>     'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>>>     'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>>>     'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>>>     'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>>>     'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>>>     'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>>>     'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>>>     'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>>>     'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>>>     'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>>>     'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>>>     'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>>>     'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>>>     'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>>>     'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>>>     'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>>>     'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>>>     'UNK':'X'
>>>     }
>>>
>>> On Nov 2, 2004, at 13:51, Iddo wrote:
>>>
>>>> Welcome aboard, and I am glad we managed to save one  Jedi from the 
>>>> dark side of  bioinformatics...;)
>>>>
>>>> As to your question:  see the attached class. I should get that in 
>>>> Biopython, but I keep not doing that....
>>>>
>>>> ./I
>>>>
>>>>
>>>> Douglas Kojetin wrote:
>>>>
>>>>> Hi All-
>>>>>
>>>>> I'm a beginner @ biopython (and I'm 'switching' from perl to 
>>>>> python ...).  First off, many thanks for the structrual biopython 
>>>>> FAQ ... very helpful!  My question:  can anyone help me with some 
>>>>> ideas on how to whip up a quick PDB->FASTA (sequence) script?
>>>>>
>>>>> From the structural biopython faq, I've been able to extract 
>>>>> residue information in the form of:
>>>>>
>>>>>  <Residue MET het=  resseq=1 icode= >
>>>>>
>>>>> I take it I would just need to grab the MET (split the residue 
>>>>> object and grab the r[1] index?) and convert into M, then append 
>>>>> to a sequence string ....
>>>>>
>>>>> but I didn't know if biopython had something that did an 
>>>>> autoconversion of MET->M, or vice versa (M->MET).
>>>>>
>>>>> Thanks for the input,
>>>>> Doug
>>>>>
>>>>> _______________________________________________
>>>>> BioPython mailing list  -  BioPython@biopython.org
>>>>> http://biopython.org/mailman/listinfo/biopython
>>>>>
>>>>>
>>>>
>>>
>>> Gavin E. Crooks
>>> Postdoctoral Fellow                  tel:  (510) 642-9614
>>> 461 Koshland Hall                    aim:notastring
>>> University of California             http://threeplusone.com/
>>> Berkeley, CA 94720-3102, USA         gec@compbio.berkeley.edu
>>>
>>>
>>>
>>
>>
>> -- 
>> Iddo Friedberg, Ph.D.
>> The Burnham Institute
>> 10901 North Torrey Pines Road
>> La Jolla, CA 92037 USA
>> T: (858) 646 3100 x3516
>> F: (858) 713 9930
>> http://ffas.ljcrf.edu/~iddo
>>
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo

From idoerg at burnham.org  Tue Nov  2 17:25:56 2004
From: idoerg at burnham.org (Iddo)
Date: Wed Nov  3 10:53:23 2004
Subject: [BioPython] PDB -> FASTA
In-Reply-To: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>
References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu>
Message-ID: <41880974.6060900@burnham.org>

Douglas Kojetin wrote:

> Hi All-
>
> I'm a beginner @ biopython (and I'm 'switching' from perl to python 
> ...).  First off, many thanks for the structrual biopython FAQ ... 
> very helpful!  My question:  can anyone help me with some ideas on how 
> to whip up a quick PDB->FASTA (sequence) script?
>
> From the structural biopython faq, I've been able to extract residue 
> information in the form of:
>
>  <Residue MET het=  resseq=1 icode= >
>
> I take it I would just need to grab the MET (split the residue object 
> and grab the r[1] index?) and convert into M, then append to a 
> sequence string ....
>
> but I didn't know if biopython had something that did an 
> autoconversion of MET->M, or vice versa (M->MET).
>
> Thanks for the input,
> Doug
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaCode.py
Type: text/x-python
Size: 2108 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20041102/56b1d8ec/aaCode-0001.py
From djkojeti at unity.ncsu.edu  Wed Nov  3 14:22:07 2004
From: djkojeti at unity.ncsu.edu (Douglas Kojetin)
Date: Wed Nov  3 14:20:32 2004
Subject: [BioPython] first,last sequence # in PDB file
Message-ID: <A6D14E52-2DCD-11D9-B865-000A9597278C@unity.ncsu.edu>

Hi All-

I've tried (for awhile) to figure this out ... can anyone tell me how 
to extract the first and last sequence number in a PDB file?  If the 
PDB started at 1, I could figure it out easily from the sequence length 
(code from a recent post).  But, what about, for example, when the PDB 
starts with residue 5 and ends on residues 120. I /tried/ to dig 
through the documentation for a reference to sequence numbers ... it's 
probably just a problem with my eyes!

Thanks,
Doug

From gca500 at york.ac.uk  Thu Nov  4 07:04:06 2004
From: gca500 at york.ac.uk (Atkinson, GC)
Date: Thu Nov  4 07:03:13 2004
Subject: [BioPython] Parsing BLAST results
Message-ID: <418A1AB6.5040205@york.ac.uk>

Hi Everyone,

You've probably had this same query hundreds of times before and are 
sick of answering it, but I'm stuck and need your help!

I'm new to Biopython, and I'm trying to run BLAST from a script like the 
example in the Cookbook:

from Bio import Fasta
from Bio.Blast import NCBIWWW
fasta= open('m_cold.fasta','r')
f_iterator=Fasta.Iterator(fasta)
record=f_iterator.next()
b_results=NCBIWWW.blast('blastn','nr',record)
save_file=open('cold.out','w')
results=b_results.read()
save_file.write(results)
save_file.close()

All I get in the results file is the HTML page that comes up when 
waiting for results ("....<p><hr>This page will be automatically updated 
in <b>14</b> seconds until search is done<BR>")

I realise Blast output keeps changing, so the code needs to be updated 
but I just don't know how!

Gemma
From gca500 at york.ac.uk  Thu Nov  4 13:02:39 2004
From: gca500 at york.ac.uk (Atkinson, GC)
Date: Thu Nov  4 13:01:10 2004
Subject: [BioPython] qblast  problems
Message-ID: <418A6EBF.70802@york.ac.uk>

Hi again,

I've downloaded the updated version of NCBIWWW to try to stop the
unexpected end of stream error when parsing blast results by using
qblast. However,  even though I definitely have the new qblast function
in the NCBIWWW module, I get the error "AttributeError: 'module' object
has no attribute 'qblast'"

Please help!

Gemma
gca500@york.ac.uk

From e.picardi at unical.it  Fri Nov  5 07:24:12 2004
From: e.picardi at unical.it (Ernesto)
Date: Fri Nov  5 07:58:10 2004
Subject: [BioPython] Node problem
Message-ID: <042101c4c332$67a479f0$572561a0@Travelmate>

Hi all,
I have a list of nodes such as:

Nodo1=Nodo("A",0.1,None,None,1)
Nodo2=Nodo("B",0.2,None,None,1)
Nodo3=Nodo("C",0.3,None,None,1)
Nodo4=Nodo("D",0.4,None,None,1)
Nodo6=Nodo("nodo6",0.2,Nodo1,Nodo2,0)
Nodo7=Nodo("nodo7",0.3,Nodo3,Nodo4,0)
Nodo8=Nodo("nodo8",0.1,Nodo6,Nodo7,0)
Nodo5=Nodo("E",0.5,None,None,1)
Nodo9=Nodo("Root",None,Nodo8,Nodo5,None).

where Nodo9 is the root. I would to substitute each node with its specific value starting from the root, in order to obtain a tree like the following:

Tree=Nodo(9,None,Nodo(8,0.01,Nodo(6,0.01,Nodo(1,0.01,None,None,1),Nodo(2,0.01,None,None,1),0),Nodo(7,0.01,Nodo(3,0.01,None,None,1),Nodo(4,0.01,None,None,1),0),0),Nodo(5,0.01,None,None,1),None)

Main difficulty is that the value of each node is an istance according to the following class:

class Nodo:
    '''definisci la struttura del nodo'''
    def __init__(self, contenuto, br=None, sinistra=None, destra=None, tip=None,seq=None):
        self.c = contenuto       #  node name
 self.b = br              # branch length
        self.s = sinistra         # left
        self.d = destra      # rigth
 self.t = tip         # is 1 for a tip or 0 for other nodes
 self.seq = seq


Could you suggest me how I could obtain Tree? I could use the recursion but I don't know.

Thank in advance for your help

Ernesto 
From JBonis at imim.es  Fri Nov  5 09:17:39 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Fri Nov  5 09:16:24 2004
Subject: [BioPython] Problems with NCBIDict and EUtils
Message-ID: <66373AD054447F47851FCC5EB49B36110610B0@basquet.imim.es>

Hi!

I have been several hours turning around a problem and don't get the solution.

I want to retrieve a gene with its SNPs, as can be done with this URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1

(take into account the last parameter: 'extrafeat=1')

Using NCBIDict, by default, the extrafeat=1 is not added so SNPs are not shown.

What I want is to add extrafeat=1 parameter to my NCBIDict['id'] calls.

Any idea? I am getting crazy!

Julio

-----Mensaje original-----
De: biopython-bounces@portal.open-bio.org
[mailto:biopython-bounces@portal.open-bio.org]En nombre de Ernesto
Enviado el: viernes, 05 de noviembre de 2004 13:24
Para: BioPython@biopython.org
Asunto: [BioPython] Node problem


Hi all,
I have a list of nodes such as:

Nodo1=Nodo("A",0.1,None,None,1)
Nodo2=Nodo("B",0.2,None,None,1)
Nodo3=Nodo("C",0.3,None,None,1)
Nodo4=Nodo("D",0.4,None,None,1)
Nodo6=Nodo("nodo6",0.2,Nodo1,Nodo2,0)
Nodo7=Nodo("nodo7",0.3,Nodo3,Nodo4,0)
Nodo8=Nodo("nodo8",0.1,Nodo6,Nodo7,0)
Nodo5=Nodo("E",0.5,None,None,1)
Nodo9=Nodo("Root",None,Nodo8,Nodo5,None).

where Nodo9 is the root. I would to substitute each node with its specific value starting from the root, in order to obtain a tree like the following:

Tree=Nodo(9,None,Nodo(8,0.01,Nodo(6,0.01,Nodo(1,0.01,None,None,1),Nodo(2,0.01,None,None,1),0),Nodo(7,0.01,Nodo(3,0.01,None,None,1),Nodo(4,0.01,None,None,1),0),0),Nodo(5,0.01,None,None,1),None)

Main difficulty is that the value of each node is an istance according to the following class:

class Nodo:
    '''definisci la struttura del nodo'''
    def __init__(self, contenuto, br=None, sinistra=None, destra=None, tip=None,seq=None):
        self.c = contenuto       #  node name
 self.b = br              # branch length
        self.s = sinistra         # left
        self.d = destra      # rigth
 self.t = tip         # is 1 for a tip or 0 for other nodes
 self.seq = seq


Could you suggest me how I could obtain Tree? I could use the recursion but I don't know.

Thank in advance for your help

Ernesto 

From mike at maibaum.org  Mon Nov  8 05:28:29 2004
From: mike at maibaum.org (Michael Maibaum)
Date: Mon Nov  8 05:29:23 2004
Subject: [BioPython] GenBank Parser Errors (Repost)
Message-ID: <EE6B24F0-3170-11D9-AB74-000A95BA5F0A@maibaum.org>


Hi,

(I'm sorry if you get this twice, but I sent it to the list last week  
and didn't get a reply so I'm hoping someone with a suggestion will see  
it this time, thanks. )


I'm trying to use biopython to parse genbank files and it is working  
happily on some genbank files,  but not many others. So far the pattern  
appears to be

Prokaryotic complete genome => OK
Eukaryotic complete genome =>failure.

The failures are typically very early in the file and don't have  
wonderfully useful information in the traceback. It falls over in the  
Martel Parser giving the error


Martel.Parser.ParserPositionException: error parsing at or beyond  
character 191. As this genome is a bit large to attatch I've just  
included the +/- 10 lines around 191

The full file, should you want it is at:
<ftp://ftp.ensembl.org/pub/current_tetraodon/data/flatfiles/genbank/ 
Tetraodon_nigroviridis.0.dat.gz>

Does anyone have any ideas why this is failing, is it just the joy of  
tracking NCBI record formats and I need to start looking at the  
internals for a fix (or use something else) or?

Is it worth trying biopython cvs?


Mac OS X 10.3.5

Python 2.3.4, up to date biopython


thanks

Michael

-- 
Dr Michael Maibaum
Department of Biochemistry and Molecular Biology, UCL
email: maibaum@biochemistry.ucl.ac.uk

From JBonis at imim.es  Tue Nov  9 06:19:52 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Tue Nov  9 06:18:19 2004
Subject: [BioPython] EUtils and SNPs
Message-ID: <66373AD054447F47851FCC5EB49B36110610B3@basquet.imim.es>

Hi, 

I posted this question some time ago, but get no answer... will send again (with a partial solution I found!)

I am trying to get all the variations (SNPs) from a given gene...

In old versions of biopython I used GenBank.NCBIDictionary(parser = GenBank.FeatureParser) ....

And I get the variations.

But now it doesnt work. The reason is that in EUtils (PubMed) now it is needed to add a parameter... "extrafeat=1" to get the SNPs (variations).....

The URL would be:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1

I have "fix" the problem touching the code in the script /EUtils/ThinClient.py

class ThinClient ....

    def efetch_using_dbids(self,
                           dbids,
                           retmode = None,
                           rettype = None,

                           # sequence only
                           seq_start = None,
                           seq_stop = None,
                           strand = None,
                           complexity = None,
                           ):

        id_string = _dbids_to_id_string(dbids)
        return self._get(program = "efetch.fcgi",
                         query = {"id": id_string,
                                  "db": dbids.db,
                                  "extrafeat": '1', #I HAVE ADDED THIS
                                  # "retmax": len(dbids.ids), # needed?
                                  "retmode": retmode,
                                  "rettype": rettype,
                                  "seq_start": seq_start,
                                  "seq_stop": seq_stop,
                                  "strand": strand,
                                  "complexity": complexity,
                                  })

So adding a parameter ("extrafeat": '1') in the self._get( query ) dictionary the efetch_using_dbids() function now fetch all the variations (SNPs included) from PubMed.

This fix is not the best one (I would like to integrate in the hole library so this extrafeat param could be set from NCBIDict calls, but works for me, as I always need the variations in my scripts).

Maybe te biopython developers team want to include this in someway for future releases.

Regards, 

Julio Bonis Sanz MD
http://www.juliobonis.com/portal/

From JBonis at imim.es  Tue Nov  9 11:58:55 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Tue Nov  9 11:57:12 2004
Subject: [BioPython] EUtils strange behaviour
Message-ID: <66373AD054447F47851FCC5EB49B36110610B8@basquet.imim.es>

Hi!

I am playing again with EUtils and have detected a strange behaviour....

Let me focus in HTR2A receptor in humans' gene:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=10835174

Well, what I want to do is to retrieve in a gb_record (GenBank python library) all the SNPs in that gene... And it is getting more complex than I guessed....

I have passed to EUtils efetch this parameters:

db=nucleotide
id=10835174 (is the gi for my mRNA: HTR2A)
retmode=html
rettype=gb (GenBank mode)
extrafeat=1 (to get the SNPs... if you dont send extrafeat=1 EUtils will not show you any SNP or variation)....


This is the URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1

Well, there are 7 variations.... 4 of them are outside the limits of the CDS (so are not transcribed). I am only interested in the SNPs of the exons....

Finally I get these 3 SNPs:

     variation       219
                     /gene="HTR2A"
                     /replace="a"
                     /replace="c"
                     /db_xref="dbSNP:1805055"
     variation       734
                     /gene="HTR2A"
                     /replace="a"
                     /replace="g"
                     /db_xref="dbSNP:6304"
     variation       1407
                     /gene="HTR2A"
                     /replace="c"
                     /replace="t"
                     /db_xref="dbSNP:1058576"


BUT.... here is the problem....

If you go to LocusLink for HTR2A gene you will find it:

http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=3356&mrna=NM_000621&ctg=NT_024524&prot=NP_000612&orien=reverse&view+rs+=view+rs+&chooseRs=coding

This is the list of SNPs for the exons in that HTR2A mRNA....

rs6314	
rs6308
rs1058576  
rs6304     
rs6305
rs6313
rs1805055  

That compared with the list we get from EUtils is:

LocusLink            EUtils

rs6314	
rs6308
rs1058576  ========= 1058576
rs6304     ========= 6304
rs6305
rs6313
rs1805055  ========= 1805055

You can see that there are 5 missing SNPs.... why?

Well, I have explanation for two of them (rs6305 and rs6313) are synonymous SNPs (no change in the protein sequence), but what about 6308 and rs6314? where have they gone?

I need to solve this mistery, to build an script that retrieves on-the-fly (from EUtils) all the SNPs of a given mRNA. I dont want to take all the contigs or FTPing the hole RefSeq human genome database. :S.

Of course I need to retrieve synonymous SNPs also (in fact one of the most clinically relevant SNPs in HTR2A is that synonymous SNPs rs6313... and nobody seems to know why).

Any comment, suggestion, idea (apart from RTFM... I promise I have read and read!!!!)

From JBonis at imim.es  Wed Nov 10 06:24:44 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Wed Nov 10 06:23:17 2004
Subject: [BioPython] RV: variations in a mRNA (was EUtils)
Message-ID: <66373AD054447F47851FCC5EB49B36110610BB@basquet.imim.es>

I sent a message to NCBI Help desk and they responded this. No so much help, but I will try to understand now how biopython works with elink to try to solve my problem.


-----Mensaje original-----
De: Gabrielian, Andrei (NIH/NLM/NCBI) [mailto:gabrieli@ncbi.nlm.nih.gov]
Enviado el: mi?rcoles, 10 de noviembre de 2004 1:19
Para: Bonis Sanz, Julio
Asunto: RE: variations in a mRNA


Dear NCBI user,

if you click on Links -> SNP for this sequence, you'll be linked to 310
SNPs:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=Display&dopt
=nucleotide_snp&from_uid=10835174

the same result you'll get if you do this via e-utilities call:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db
=snp&id=10835174

Hope this helps,
NCBI Help desk


------------- Begin Forwarded Message -------------

X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0
Content-class: urn:content-classes:message
MIME-Version: 1.0
Subject: variations in a mRNA
Date: Tue, 9 Nov 2004 17:08:26 +0100
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: variations in a mRNA
Thread-Index: AcTGdc2Y8NYse+0eQjugxxuZ3qtWcQ==
From: "Bonis Sanz, Julio" < >
To: <eutilities@ncbi.nlm.nih.gov>
X-Scanned-By: MIMEDefang 2.36
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by 
centaurus.ncbi.nlm.nih.gov id iA9G8pTt012815

Dear Sirs, 

I am building an application that tries to retrieve all the SNPs contained
in a 
given gene.

I am using this URI:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10
8351
74&rettype=gb&extrafeat=1

that is the gene for HTR2A human receptor....

I only get 7 variations (SNPs)... 

But if you go to 

http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=3356&mrna=NM_000621&ctg=
NT_0
24524&prot=NP_000612&orien=reverse&view+rs+=view+rs+&chooseRs=all

You can see that there are several more SNPs (intron ones and synonims ones
for 
example).

What I want is to retrieve by EUtils ALL the SNPs in a given gene, but I am
not 
sure of the parameters to send to the EUtils 

Any help will be appreciated.

Julio Bonis Sanz MD
Research Group on Biomedical Informatics GRIB: http://www.imim.es/grib/


------------- End Forwarded Message -------------


From JBonis at imim.es  Wed Nov 10 08:56:18 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Wed Nov 10 08:54:30 2004
Subject: [BioPython] EUtils strange behaviour
Message-ID: <66373AD054447F47851FCC5EB49B36110610BD@basquet.imim.es>

Hi all, hi Andrew, 

I have been working on the problem of SNPs....

BTW: I found the "extrafeat" param by emailing with NCBI staff. No documented parameter! (really, the documentation of EUtils is very poor).

Something curious about extrafeat is that if you put extrafeat=0 you dont get any feature.variation, if you send extrafeat=1 you get feature.variation /s ... but (and this is the curious thing) if you send extrafeat=3 or 5 or 7 or 9 ... you get the same results that for extrafeat=1 ... :D .... only God knows how that param works internally in Entrez logic :).

Turning on the original problem:

One indirect way to solve it is by using elink... in this way:

You send to elink.fcgi:

dbfrom=nucleotide
db=snp
id= __GI of the mRNA of your interest__

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=snp&id=10835174

so you get: 


<!-- PART OF XML RESULTS -->
<?xml version="1.0"?>
<!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd">
<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>10835174</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>633737</Id>
		</Link>
		<Link>
			<Id>9534508</Id>
		</Link>

		<Link>
			<Id>4142900</Id>
		</Link>
<!-- END OF PART OF XML RESULTS -->

Then in some way you have to "extract" the list of rs IDs in a list rsList = []... I have not work on it but should not be hard... 

is in biopython any class or method already implemented to do this task?

Then you can make a kind of 
	for rs in rsList:
		....

and for each  rs do the following:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html

sending to efetch.fcgi the following:

db = snp
id = __the rs__
rettype = flt
retmode = html

so you get this: 


<!-- START OF SNP INFORMATION -->

1: rs6313 [Homo sapiens] 
rs6313 | Hs | 9606 | snp | genotype=NO | submitterlink=YES | updated 08/05/2004 16:40:00
ss7941 | WIAF-CSNP | WIAF-10853 | orient=+
ss4928286 | YUSUKE | IMS-JST093413 | orient=-
ss11087375 | BCM_SSAHASNP | chr13.NT_024524.12_16044432 | orient=-
ss13329237 | SC_SNP | NT_024524.12_16044432 | orient=-
ss19284906 | CSHL-HAPMAP | CSHL-HuDD-200402.chr13.NT_024524.13_28449941 | orient=-
ss21114665 | SSAHASNP | WGSA-200403-chr13.chr13.NT_024524.13_28449941 | orient=-
ss22887275 | IMCJ-GDT | IMCJ-HTR2A_4-CT | orient=+
SNP | alleles='C/T' | het=0.499997 | se(het)=0.00128609
VAL | validated=YES | min_prob=0 | max_prob=? | notwithdrawn
MAP | ncbi_num_chr=1 | ncbi_num_ctg=1 | ncbi_num_sec_loc=1 | ncbi_weight=1
CTG | chr=13 | chr-pos=45267941 | Hs13_24680_34:13 | ctg-start=28449941 | ctg-end=28449941 | loctype=2 | orient=-
LOC | HTR2A | locus_id=3356 | fxn-class=coding-synon | allele=T | frame=3 | residue=S | aa_position=34
LOC | HTR2A | locus_id=3356 | fxn-class=reference | allele=C | frame=3 | residue=S | aa_position=34
GBL | HTR2A | locus_id=3356 | fxn-class=coding-synon
SEQ | AF498982:1 | source-db=gb-mrna | seq-pos=102 | orient=+
SEQ | AL160397:1 | source-db=hgs-finish | seq-pos=45162 | orient=-
SEQ | G28536:1 | source-db=gb-sts | seq-pos=247 | orient=+
SEQ | M86841:1 | source-db=gb-mrna | seq-pos=78 | orient=+
SEQ | NM_000621:1 | source-db=ref-mrna | seq-pos=247 | orient=+
SEQ | S42165:1 | source-db=hgs-finish | seq-pos=102 | orient=+
SEQ | S71229:1 | source-db=gb-mrna | seq-pos=152 | orient=+
SEQ | X57830:1 | source-db=gb-mrna | seq-pos=247 | orient=+

<!-- END OF SNP INFORMATION -->

Again, somekind of script should extract the information you are interested in... and that's all!

I will work in it and send my script soon...

cheers, 

Julio Bonis Sanz MD
http://www.juliobonis.com/portal/


-----Mensaje original-----
De: Andrew Dalke [mailto:dalke@dalkescientific.com]
Enviado el: mi?rcoles, 10 de noviembre de 2004 9:06
Para: Bonis Sanz, Julio
CC: biopython@biopython.org
Asunto: Re: [BioPython] EUtils strange behaviour


Hi again Julio,

> Any comment, suggestion, idea (apart from RTFM... I promise I have 
> read and read!!!!)

No doubt you were as frustrated as I on the documentation.  It's
opaque and incomplete.  I spent a lot of time doing what you were
doing, testing things and seeing if I could make sense of it all.

I didn't figure out the SNPs interface.  I just don't know enough
about that domain and there's pretty much no documentation for that.
You should try the NCBI EUtils help desk at eutilities@ncbi.nlm.nih.gov 
.


					Andrew
					dalke@dalkescientific.com


From JBonis at imim.es  Wed Nov 10 09:03:00 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Wed Nov 10 09:01:27 2004
Subject: [BioPython] EUtils and SNPs
Message-ID: <66373AD054447F47851FCC5EB49B36110610BE@basquet.imim.es>

By the way:

I want to send this to EUtils:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html

but snp is not defined as a valid db in DBIds ... any fast solution?

where are the DBIds databases defined?

regards, 

Julio Bonis Sanz

-----Mensaje original-----
De: Andrew Dalke [mailto:dalke@dalkescientific.com]
Enviado el: mi?rcoles, 10 de noviembre de 2004 8:46
Para: Bonis Sanz, Julio
CC: biopython@biopython.org
Asunto: Re: [BioPython] EUtils and SNPs


Hi Julio,

> But now it doesnt work. The reason is that in EUtils (PubMed) now it 
> is needed to add a parameter... "extrafeat=1" to get the SNPs 
> (variations).....

Interesting.  That isn't documented that I know about.  How did you
find out about it?

> I have "fix" the problem touching the code in the script 
> /EUtils/ThinClient.py

Okay, I'll add that to the EUtils v.2 package I announced here a
couple months ago.  Haven't gotten any feedback on it.  Anyone tried it?

> Maybe te biopython developers team want to include this in someway for 
> future releases.

Mmm, so the question is how to pass extra parameters from the NCBIDict
to the EUtils package.

I've been working on other projects since that preview release.  I'll
bump up the priority on it.

					Andrew
					dalke@dalkescientific.com


From JBonis at imim.es  Wed Nov 10 09:20:27 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Wed Nov 10 09:18:38 2004
Subject: Nevermind....RE: [BioPython] EUtils and SNPs
Message-ID: <66373AD054447F47851FCC5EB49B36110610BF@basquet.imim.es>

Sorry....

I found myself the answer to my last question:  :)

>>> eutils = ThinClient.ThinClient()
>>> dbids = DBIds("snp",["6313"])
>>> infile = eutils.efetch_using_dbids(dbids,rettype='flt',retmode='html')


-----Mensaje original-----
De: biopython-bounces@portal.open-bio.org
[mailto:biopython-bounces@portal.open-bio.org]En nombre de Bonis Sanz,
Julio
Enviado el: mi?rcoles, 10 de noviembre de 2004 15:03
Para: biopython@biopython.org
Asunto: RE: [BioPython] EUtils and SNPs


By the way:

I want to send this to EUtils:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html

but snp is not defined as a valid db in DBIds ... any fast solution?

where are the DBIds databases defined?

regards, 

Julio Bonis Sanz

-----Mensaje original-----
De: Andrew Dalke [mailto:dalke@dalkescientific.com]
Enviado el: mi?rcoles, 10 de noviembre de 2004 8:46
Para: Bonis Sanz, Julio
CC: biopython@biopython.org
Asunto: Re: [BioPython] EUtils and SNPs


Hi Julio,

> But now it doesnt work. The reason is that in EUtils (PubMed) now it 
> is needed to add a parameter... "extrafeat=1" to get the SNPs 
> (variations).....

Interesting.  That isn't documented that I know about.  How did you
find out about it?

> I have "fix" the problem touching the code in the script 
> /EUtils/ThinClient.py

Okay, I'll add that to the EUtils v.2 package I announced here a
couple months ago.  Haven't gotten any feedback on it.  Anyone tried it?

> Maybe te biopython developers team want to include this in someway for 
> future releases.

Mmm, so the question is how to pass extra parameters from the NCBIDict
to the EUtils package.

I've been working on other projects since that preview release.  I'll
bump up the priority on it.

					Andrew
					dalke@dalkescientific.com


_______________________________________________
BioPython mailing list  -  BioPython@biopython.org
http://biopython.org/mailman/listinfo/biopython

From gca500 at york.ac.uk  Wed Nov 10 12:01:00 2004
From: gca500 at york.ac.uk (Atkinson, GC)
Date: Wed Nov 10 11:59:45 2004
Subject: [BioPython] Blast descriptions and alignments
Message-ID: <4192494C.9000200@york.ac.uk>

Hi,

I'm doing a qblast and I'm having problems retrieving 500 alignments and 
descriptions.

I'm using:

b_results=NCBIWWW.qblast('blastp','nr',record, descriptions = 500, 
alignments = 500)

It just retrieves about 100 results, though doing a blast search without 
my Python script gets the full 500.

Any ideas?

Gemma Atkinson
From aaron at atroxen.com  Fri Nov 12 00:21:53 2004
From: aaron at atroxen.com (Aaron Zschau)
Date: Fri Nov 12 00:20:10 2004
Subject: [BioPython] genbank protein lookup issues
Message-ID: <C3768C45-346A-11D9-9E26-003065706E3A@atroxen.com>

Something seems to have broken since I last ran this program a few 
weeks ago, my line:

gi_list = GenBank.search_for(form["genename"].value, database="protein")

with values for form["genename"].value such as "YAL038w"  but it seems 
to give the error no matter what search string I use. I haven't made 
any changes to my code since it last worked so I'm wondering if genbank 
changed something.


I get the following errors in my logs from running:

[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] Traceback (most 
recent call last):, referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File 
"/var/www/cgi-bin/cluster.py", line 78, in ?, referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     gi_list = 
GenBank.search_for(form["genename"].value, database="protein"), 
referer: http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File 
"/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/GenBank/__init__.py", 
line 1398, in search_for, referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     retstart = 
start_id, retmax = max_ids), referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File 
"/usr/lib/python2.2/site-packages/Bio/EUtils/DBIdsClient.py", line 294, 
in search, referer: http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     searchinfo = 
parse.parse_search(infile, [None]), referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File 
"/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/EUtils/parse.py", 
line 219, in parse_search, referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     raise 
TypeError("Unknown OP code: %r" % (s,)), referer: 
http://serval.atroxen.com:8080/interface.html
[Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] TypeError: 
Unknown OP code: u'GROUP', referer: 
http://serval.atroxen.com:8080/interface.html


any help would be appreciated,

Aaron Zschau

From aaron at atroxen.com  Sat Nov 13 14:03:51 2004
From: aaron at atroxen.com (Aaron Zschau)
Date: Sat Nov 13 14:02:05 2004
Subject: [BioPython] genbank protein lookup issues
In-Reply-To: <C3768C45-346A-11D9-9E26-003065706E3A@atroxen.com>
References: <C3768C45-346A-11D9-9E26-003065706E3A@atroxen.com>
Message-ID: <C1A9B220-35A6-11D9-8B33-003065706E3A@atroxen.com>

I've managed to confirm that the problem line is as follows:

gi_list = GenBank.search_for("YAL038w",database='protein')

if I just take that one line out and run it by hand from the python  
prompt (with the necessary imports first), I get the following error:

 >>> gi_list = GenBank.search_for("YAL038w",database='protein')
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File  
"/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/GenBank/__init__.py",  
line 1398, in search_for
     retstart = start_id, retmax = max_ids)
   File "/usr/lib/python2.2/site-packages/Bio/EUtils/DBIdsClient.py",  
line 294, in search
     searchinfo = parse.parse_search(infile, [None])
   File  
"/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/EUtils/parse.py",  
line 219, in parse_search
     raise TypeError("Unknown OP code: %r" % (s,))
TypeError: Unknown OP code: u'GROUP'


Does anybody know why this would be happening or if it is a known  
problem with the genbank servers/parsing?

thanks,

Aaron Zschau


On Nov 12, 2004, at 12:21 AM, Aaron Zschau wrote:

> Something seems to have broken since I last ran this program a few  
> weeks ago, my line:
>
> gi_list = GenBank.search_for(form["genename"].value,  
> database="protein")
>
> with values for form["genename"].value such as "YAL038w"  but it seems  
> to give the error no matter what search string I use. I haven't made  
> any changes to my code since it last worked so I'm wondering if  
> genbank changed something.
>
>
> I get the following errors in my logs from running:
>
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] Traceback (most  
> recent call last):, referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File  
> "/var/www/cgi-bin/cluster.py", line 78, in ?, referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     gi_list =  
> GenBank.search_for(form["genename"].value, database="protein"),  
> referer: http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File  
> "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/GenBank/ 
> __init__.py", line 1398, in search_for, referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     retstart =  
> start_id, retmax = max_ids), referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File  
> "/usr/lib/python2.2/site-packages/Bio/EUtils/DBIdsClient.py", line  
> 294, in search, referer: http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     searchinfo  
> = parse.parse_search(infile, [None]), referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]   File  
> "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/EUtils/parse.py",  
> line 219, in parse_search, referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22]     raise  
> TypeError("Unknown OP code: %r" % (s,)), referer:  
> http://serval.atroxen.com:8080/interface.html
> [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] TypeError:  
> Unknown OP code: u'GROUP', referer:  
> http://serval.atroxen.com:8080/interface.html
>
>
> any help would be appreciated,
>
> Aaron Zschau
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython

From raghunath at jhmi.edu  Sun Nov 14 15:47:05 2004
From: raghunath at jhmi.edu (Raghunath Reddy)
Date: Sun Nov 14 12:40:17 2004
Subject: [BioPython] BLAST Problem
Message-ID: <opshg2orttz7bfkm@aplab12.monument1.jhmi.edu>

I have been trying the biopython Blast module. But i'm getting the  
following error,

INFO: Failure to set-up BLAST search with database 'refseq'

Any suggetsions?.

Thanks
Raghunath
From biopython at maubp.freeserve.co.uk  Fri Nov 19 06:39:01 2004
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri Nov 19 06:38:02 2004
Subject: [BioPython] GenBank parsing errors
Message-ID: <419DDB55.2040907@maubp.freeserve.co.uk>

I have been trying to use the GenBank parser and have had some trouble.

I notice from the archives that Michael Maibaum has also had difficulties:

http://portal.open-bio.org/pipermail/biopython/2004-November/002457.html

Michael wrote:

> I'm trying to use biopython to parse genbank files and it is working 
> happily on some genbank files,  but not many others. So far the 
> pattern appears to be
> 
> Prokaryotic complete genome => OK
 > Eukaryotic complete genome =>failure

I have not tried any prokaryotes, but I have tried several eukaryotes
without any success.

While I do recall have seen Martel parser errors (probably like Michael
had), I generally have a different problem.

For example, this small sample of code fails using E. coli K12, file
NC_000913.gbk (about 10MB) available from here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/

from Bio import GenBank
gb_handle = open('NC_000913.gbk', 'r')
feature_parser = GenBank.FeatureParser()
gb_iterator = GenBank.Iterator(gb_handle, feature_parser)
print 'So far so good'
cur_record = gb_iterator.next()
print 'Done'

I see CPU usage at almost 100%, and memory usage for Python goes
steadily up.  At about 200 or 300MB the CPU usage drops, and my system
becomes very sluggish.  I normally kill the process at this point.

Windows XP
BioPython 1.30
Python 2.3

Does anyone got the GenBank parser to work on a bacterial genome?

Thank you

Peter
From jonathan.taylor at utoronto.ca  Fri Nov 19 19:39:59 2004
From: jonathan.taylor at utoronto.ca (Jonathan Taylor)
Date: Fri Nov 19 19:38:41 2004
Subject: [BioPython] Biopython and Quixote preventing use of
	Bio.GenBank.NCBIDictionary
Message-ID: <1100911199.18277.21.camel@dallas.bbc.botany.utoronto.ca>

Hi,

I am using quixote, a web application server, to develop a web based
application.

I have a genbank id that I want to get the taxonomical description of. 
I can do this easily via my util.py script from the command line.  When
I it from inside my web application I get the error below.

Any help is greatly appreciated.

Thanks Jon.

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/quixote/publish.py", line 522, in process_request
    output = self.try_publish(request, env.get('PATH_INFO', ''))
  File "/usr/lib/python2.3/site-packages/quixote/publish.py", line 457, in try_publish
    output = object(request)
  File "/home/jtaylor/projects/fungid/fungid/web/input.ptl", line 93, in _q_index
    return form.action(request, 'Submit', values)
  File "/home/jtaylor/projects/fungid/fungid/web/input.ptl", line 67, in action
    seq = blast.add_sequence(record)
  File "/home/jtaylor/projects/fungid/fungid/lib/blast.py", line 100, in add_sequence
    name = util.lookup_name_from_gi(gi)
  File "/home/jtaylor/projects/fungid/fungid/lib/util.py", line 8, in lookup_name_from_gi
    ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = feature_parser)
  File "/home/jtaylor/lib/python/Bio/GenBank/__init__.py", line 1320, in __init__
    self.db = db["nucleotide-genbank-eutils"]
  File "/home/jtaylor/lib/python/Bio/config/Registry.py", line 93, in __getitem__
    return self._name_table[name]  # raises KeyError for unknown entries
KeyError: 'nucleotide-genbank-eutils'


Here is util.py:

from Bio import GenBank

def lookup_name_from_gi(gi):

  feature_parser = GenBank.RecordParser()
  ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser =
feature_parser)

  record = ncbi_dict[str(gi)]

  # may want to use the taxonomy list to help here
  return record.source

if __name__ == '__main__':
  print lookup_name_from_gi(47499221)


From biopython at maubp.freeserve.co.uk  Wed Nov 24 14:35:20 2004
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed Nov 24 14:30:29 2004
Subject: [BioPython] GenBank parsing errors
In-Reply-To: <6.0.3.0.0.20041124091100.01c3e070@mail.xbioinformatics.org>
References: <419DDB55.2040907@maubp.freeserve.co.uk>
	<6.0.3.0.0.20041124091100.01c3e070@mail.xbioinformatics.org>
Message-ID: <41A4E278.7040103@maubp.freeserve.co.uk>

Peter wrote:
>> For example, this small sample of code fails using E. coli K12,
>> file NC_000913.gbk (about 10MB) available from here:

(code removed)

>> I see CPU usage at almost 100%, and memory usage for Python 
>> goes steadily up.  At about 200 or 300MB the CPU usage drops, 
>> and my system becomes very sluggish.  I normally kill the 
>> process at this point.

Admin wrote:
> I have tried to use the Genbank or bacterial genomes in the past 
> but I had to abandon it because it thrashes around in memory as 
> you have described, as the sequence is just too large for the 
> API.  I was splicing out cds features from the records.
> 
> I had to write a custom parser to get the job done

Good to know its not just me or my computer :)

I have also resorted to writing my own custom script to do the job.

For each gene I wanted the translated sequence and the CDS 
information (i.e. position in genome), which are both fairly easy to 
get from the GenBank file.

I wrote some code to convert the GenBank file into a custom .faa 
FASTA file with the CDS location (and a few other properties like 
the product) encoded into the FASTA title records.

This lets me use all the BioPython FASTA support, and get the 
additional information by parsing the sequence title/description.

Peter
From JBonis at imim.es  Thu Nov 25 10:39:40 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Thu Nov 25 10:37:31 2004
Subject: [BioPython] Bug in GenBank.FeatureParser detected
Message-ID: <66373AD054447F47851FCC5EB49B36110610D4@basquet.imim.es>

Hi, 

I was trying to parse a genbank record obtained by using eutils:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=51511729&rettype=gbwithparts&retmode=text&seq_start=46267941&seq_stop=46467941

It means, human chromosome 13, from 46267941 to 46467941. My idea is to give a position and a chromosome and then get the surrounding genes/mRNAs/CDS/SNPs in the area....

Well, I have found that when trying to use GenBank.FeatureParser() it shows an error here:

LOCUS       NC_000013             200001 bp    DNA     linear   CON 25-OCT-2004
DEFINITION  Homo sapiens chromosome 13, complete sequence.
ACCESSION   NC_000013 REGION: 46267941..46467941
VERSION     NC_000013.9  GI:51511729
KEYWORDS    HTG.

Specifically in line:

ACCESSION   NC_000013 REGION: 46267941..46467941

...

If you remove 'REGION: 46267941..46467941' then the parser works fine.

I guess it should be easy to fix in the Biopython code... but I get lost when trying to find where....

Someone can help? I suggest to add the fix as soon as possible...

Regards, 

Julio Bonis Sanz MD http://www.juliobonis.com/portal/

Research Group on Biomedical Informatics http://www.imim.es/grib/
Barcelona - Spain


From JBonis at imim.es  Mon Nov 29 06:55:05 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Mon Nov 29 06:52:43 2004
Subject: [BioPython] How is a join stored in the gb_record?
Message-ID: <66373AD054447F47851FCC5EB49B36110610D9@basquet.imim.es>

hi!

I have parsed a "genome" genBank record with biopython...

Lets say:

     mRNA            join(1870..1959,2070..2106,8593..8646,9327..9399,
                     20023..20132,25649..25786,30723..30800,38444..40397)

what I need is to retrieve from the gb_record the subsegments (1870 to 1959, 2070 to 2106....)

But I only have realized how to retrieve the start and end points (1870 and 40397) by:

gb_record.features[x].location._start.position
gb_record.features[x].location._end.position


Any idea?

regards, 

Julio Bonis Sanz

-----Mensaje original-----
De: biopython-bounces@portal.open-bio.org
[mailto:biopython-bounces@portal.open-bio.org]En nombre de Bonis Sanz,
Julio
Enviado el: jueves, 25 de noviembre de 2004 16:40
Para: biopython@biopython.org
Asunto: [BioPython] Bug in GenBank.FeatureParser detected


Hi, 

I was trying to parse a genbank record obtained by using eutils:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=51511729&rettype=gbwithparts&retmode=text&seq_start=46267941&seq_stop=46467941

It means, human chromosome 13, from 46267941 to 46467941. My idea is to give a position and a chromosome and then get the surrounding genes/mRNAs/CDS/SNPs in the area....

Well, I have found that when trying to use GenBank.FeatureParser() it shows an error here:

LOCUS       NC_000013             200001 bp    DNA     linear   CON 25-OCT-2004
DEFINITION  Homo sapiens chromosome 13, complete sequence.
ACCESSION   NC_000013 REGION: 46267941..46467941
VERSION     NC_000013.9  GI:51511729
KEYWORDS    HTG.

Specifically in line:

ACCESSION   NC_000013 REGION: 46267941..46467941

...

If you remove 'REGION: 46267941..46467941' then the parser works fine.

I guess it should be easy to fix in the Biopython code... but I get lost when trying to find where....

Someone can help? I suggest to add the fix as soon as possible...

Regards, 

Julio Bonis Sanz MD http://www.juliobonis.com/portal/

Research Group on Biomedical Informatics http://www.imim.es/grib/
Barcelona - Spain


_______________________________________________
BioPython mailing list  -  BioPython@biopython.org
http://biopython.org/mailman/listinfo/biopython

From JBonis at imim.es  Tue Nov 30 10:13:35 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Tue Nov 30 10:11:17 2004
Subject: [BioPython] TypeError: Unknown OP code: u'GROUP' and other issues
Message-ID: <66373AD054447F47851FCC5EB49B36110610E0@basquet.imim.es>

Hi all, 

Working with biopython I have found some bugs....

For example when using GenBank.search_for() function:


>> GenBank.search_for("HTR2A[GENE]")

it returns the error:

>> TypeError: Unknown OP code: u'GROUP'

This is becouse the XML from NCBI returns a "new" operator, named GROUP that is not included in the biopython code. 

	<TranslationStack>
		<TermSet>
			<Term>HTR2A[GENE]</Term>
			<Field>GENE</Field>

			<Count>27</Count>
			<Explode>Y</Explode>
		</TermSet>
		<OP>GROUP</OP>
	</TranslationStack>

I dont know what those operators means, but touching the biopython code (Bio/EUtils/parse.py) like this:

                elif s == "NOT":
                    stack[-2:] = [Datatypes.Not(stack[-2], stack[-1])]
                #added by jbonis
                elif s == "GROUP":
                    garbage = s
                #end of added by jbonis
                else:
                    raise TypeError("Unknown OP code: %r" % (s,))

it works....

But there is other problem.... using GenBank.search_for() ... it returns:

"Error: Sequence Viewer does not have any Presentations for code='gi_text'"

the solution I have found is to change the Bio/GenBank/__init__py search_for() function like this:

>>    for db_id in db_ids:
>>        #ids.append(db_id.dbids.ids[0]) #removed by jbonis
>>        ids.append(int(db_id.records_dbids.ids[0])) #jbonis
>>    return ids

Hope it helps to others with the same problem, and to the people from biopython to improve the next release.

regards, 

Julio Bonis Sanz MD
http://www.juliobonis.com/portal/


From JBonis at imim.es  Tue Nov 30 10:27:36 2004
From: JBonis at imim.es (Bonis Sanz, Julio)
Date: Tue Nov 30 10:25:09 2004
Subject: mistake: RE: [BioPython] TypeError: Unknown OP code: u'GROUP' and
	other issues
Message-ID: <66373AD054447F47851FCC5EB49B36110610E2@basquet.imim.es>

There was an error in my code:

please replace this:

>>    for db_id in db_ids:
>>        #ids.append(db_id.dbids.ids[0]) #removed by jbonis
>>        ids.append(int(db_id.records_dbids.ids[0])) #jbonis
>>    return ids

with this:

>>    for db_id in db_ids:
>>        #ids.append(db_id.dbids.ids[0]) #removed by jbonis
>>        ids.append(str(int(db_id.records_dbids.ids[0]))) #jbonis
>>    return ids

( add a str to ids.append() ) ... it should work fine now


-----Mensaje original-----
De: biopython-bounces@portal.open-bio.org
[mailto:biopython-bounces@portal.open-bio.org]En nombre de Bonis Sanz,
Julio
Enviado el: martes, 30 de noviembre de 2004 16:14
Para: biopython@biopython.org
Asunto: [BioPython] TypeError: Unknown OP code: u'GROUP' and other
issues


Hi all, 

Working with biopython I have found some bugs....

For example when using GenBank.search_for() function:


>> GenBank.search_for("HTR2A[GENE]")

it returns the error:

>> TypeError: Unknown OP code: u'GROUP'

This is becouse the XML from NCBI returns a "new" operator, named GROUP that is not included in the biopython code. 

	<TranslationStack>
		<TermSet>
			<Term>HTR2A[GENE]</Term>
			<Field>GENE</Field>

			<Count>27</Count>
			<Explode>Y</Explode>
		</TermSet>
		<OP>GROUP</OP>
	</TranslationStack>

I dont know what those operators means, but touching the biopython code (Bio/EUtils/parse.py) like this:

                elif s == "NOT":
                    stack[-2:] = [Datatypes.Not(stack[-2], stack[-1])]
                #added by jbonis
                elif s == "GROUP":
                    garbage = s
                #end of added by jbonis
                else:
                    raise TypeError("Unknown OP code: %r" % (s,))

it works....

But there is other problem.... using GenBank.search_for() ... it returns:

"Error: Sequence Viewer does not have any Presentations for code='gi_text'"

the solution I have found is to change the Bio/GenBank/__init__py search_for() function like this:

>>    for db_id in db_ids:
>>        #ids.append(db_id.dbids.ids[0]) #removed by jbonis
>>        ids.append(int(db_id.records_dbids.ids[0])) #jbonis
>>    return ids

Hope it helps to others with the same problem, and to the people from biopython to improve the next release.

regards, 

Julio Bonis Sanz MD
http://www.juliobonis.com/portal/


_______________________________________________
BioPython mailing list  -  BioPython@biopython.org
http://biopython.org/mailman/listinfo/biopython