<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2657.27">
<TITLE>RE: [EMBOSS] Does seqret have limitations ?</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Thank you very much to everybody, the problem was, as some of you suspected, in the genbank file...</FONT>
</P>
<P><FONT SIZE=2>Best wishes</FONT>
</P>
<P><FONT SIZE=2>Caroline.</FONT>
</P>
<P><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> "Stefanie Lager" </FONT>
<BR><FONT SIZE=2> <stefanielager@f </FONT>
<BR><FONT SIZE=2> astmail.ca> An: emboss@embnet.org </FONT>
<BR><FONT SIZE=2> Gesendet von: Kopie: </FONT>
<BR><FONT SIZE=2> owner-emboss@hgm Thema: RE: [EMBOSS] Does seqret have limitations ? </FONT>
<BR><FONT SIZE=2> p.mrc.ac.uk </FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> 17.12.03 07:05 </FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> </FONT>
</P>
<BR>
<BR>
<BR>
<P><FONT SIZE=2>It sounds as if it's problems with a single sequence in the file. Try</FONT>
<BR><FONT SIZE=2>removing the sequence it hangs on or try split the original file in</FONT>
<BR><FONT SIZE=2>parts and see if it there is a single sequence it hangs on. Other</FONT>
<BR><FONT SIZE=2>programs can have problems with end of line characters, but this</FONT>
<BR><FONT SIZE=2>doesn't sound like that.</FONT>
</P>
<P><FONT SIZE=2>Stefanie</FONT>
</P>
<P><FONT SIZE=2>> Dear Simon,</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> I don't receive any error from seqret, it simply stops just as if</FONT>
<BR><FONT SIZE=2>> it was correctly finished.</FONT>
<BR><FONT SIZE=2>> The file is not bigger than 2Gb:</FONT>
<BR><FONT SIZE=2>> $ du -sk file.gbk</FONT>
<BR><FONT SIZE=2>> 74680 file.gbk</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Even with the cat command you sent me, I only get 17143 sequences</FONT>
<BR><FONT SIZE=2>> formatted in GCG format. (it is the same if I try to convert in</FONT>
<BR><FONT SIZE=2>> fasta format)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> $ grep -c "Check" file.gcg</FONT>
<BR><FONT SIZE=2>> 17143</FONT>
<BR><FONT SIZE=2>> $ grep -c "LOCUS" file.gbk</FONT>
<BR><FONT SIZE=2>> 26045</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> $ seqret file.gbk -osformat fasta -outseq test</FONT>
<BR><FONT SIZE=2>> Reads and writes (returns) sequences</FONT>
<BR><FONT SIZE=2>> $ grep -c ">" test</FONT>
<BR><FONT SIZE=2>> 17143</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> If anybody has an idea...</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Thanks a lot,</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Caroline.</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: simon andrews (BI) [<A HREF="mailto:simon.andrews@bbsrc.ac.uk">mailto:simon.andrews@bbsrc.ac.uk</A>]</FONT>
<BR><FONT SIZE=2>> Sent: mardi, 16. décembre 2003 17:47</FONT>
<BR><FONT SIZE=2>> To: 'emboss@embnet.org'</FONT>
<BR><FONT SIZE=2>> Subject: RE: [EMBOSS] Does seqret have limitations ?</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Barretto,Caroline,LAUSANNE,NRC/BAS</FONT>
<BR><FONT SIZE=2>> [<A HREF="mailto:Caroline.Barretto@rdls.nestle.com">mailto:Caroline.Barretto@rdls.nestle.com</A>]</FONT>
<BR><FONT SIZE=2>> Sent: 16 December 2003 16:12</FONT>
<BR><FONT SIZE=2>> To: emboss@embnet.org</FONT>
<BR><FONT SIZE=2>> Subject: [EMBOSS] Does seqret have limitations ?</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>> Dear all,</FONT>
<BR><FONT SIZE=2>>></FONT>
<BR><FONT SIZE=2>>> Did anybody notice that the seqret program seems to</FONT>
<BR><FONT SIZE=2>>> be limited by the number of sequences to convert ? I</FONT>
<BR><FONT SIZE=2>>> am trying to convert 1 file containing 23000 genbank</FONT>
<BR><FONT SIZE=2>>> sequences into GCG format.</FONT>
<BR><FONT SIZE=2>>></FONT>
<BR><FONT SIZE=2>>> Do you have a suggestion for that ?</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Seqret is not limited by number of files. I routinely pass the</FONT>
<BR><FONT SIZE=2>> whole of EMBL through seqret and it works fine. What error do you</FONT>
<BR><FONT SIZE=2>> get when seqret stops? Could it just be that there is a malformed</FONT>
<BR><FONT SIZE=2>> entry part way through your file?</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Is the file you are trying to convert >2Gb in size? If so this</FONT>
<BR><FONT SIZE=2>> could be the reason for the failure rather than seqret being</FONT>
<BR><FONT SIZE=2>> limited by the number of sequences. In this case though I thought</FONT>
<BR><FONT SIZE=2>> that the failure would happen when the file was first opened and</FONT>
<BR><FONT SIZE=2>> not after a certain number of sequences had passed through.</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> If the problem is a large file you might be able to get round this</FONT>
<BR><FONT SIZE=2>> by using a pipe to get information into seqret. Try</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> cat your_genbank_file.gb | seqret -filter -osf gcg ></FONT>
<BR><FONT SIZE=2>> your_gcg_file.gcg</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> This should work as long as your OS version of cat and your shell</FONT>
<BR><FONT SIZE=2>> can handle large files.</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Hope this helps</FONT>
</P>
<P><FONT SIZE=2>_________________________________________________________________</FONT>
<BR><FONT SIZE=2> <A HREF="http://fastmail.ca/" TARGET="_blank">http://fastmail.ca/</A> - Fast Secure Web Email for Canadians</FONT>
</P>
<BR>
<BR>
</BODY>
</HTML>