<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2657.27">
<TITLE>RE: [EMBOSS] Does seqret have limitations ?</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Dear Simon,</FONT>
</P>
<P><FONT SIZE=2>I don't receive any error from seqret, it simply stops just as if it was correctly finished. </FONT>
<BR><FONT SIZE=2>The file is not bigger than 2Gb:</FONT>
<BR><FONT SIZE=2>$ du -sk file.gbk</FONT>
<BR><FONT SIZE=2>74680 file.gbk</FONT>
</P>
<P><FONT SIZE=2>Even with the cat command you sent me, I only get 17143 sequences formatted in GCG format. (it is the same if I try to convert in fasta format)</FONT></P>
<P><FONT SIZE=2>$ grep -c "Check" file.gcg</FONT>
<BR><FONT SIZE=2>17143</FONT>
<BR><FONT SIZE=2>$ grep -c "LOCUS" file.gbk</FONT>
<BR><FONT SIZE=2>26045</FONT>
</P>
<P><FONT SIZE=2>$ seqret file.gbk -osformat fasta -outseq test</FONT>
<BR><FONT SIZE=2>Reads and writes (returns) sequences</FONT>
<BR><FONT SIZE=2>$ grep -c ">" test</FONT>
<BR><FONT SIZE=2>17143</FONT>
</P>
<P><FONT SIZE=2>If anybody has an idea...</FONT>
</P>
<P><FONT SIZE=2>Thanks a lot,</FONT>
</P>
<P><FONT SIZE=2>Caroline.</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: simon andrews (BI) [<A HREF="mailto:simon.andrews@bbsrc.ac.uk">mailto:simon.andrews@bbsrc.ac.uk</A>]</FONT>
<BR><FONT SIZE=2>Sent: mardi, 16. décembre 2003 17:47</FONT>
<BR><FONT SIZE=2>To: 'emboss@embnet.org'</FONT>
<BR><FONT SIZE=2>Subject: RE: [EMBOSS] Does seqret have limitations ?</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: Barretto,Caroline,LAUSANNE,NRC/BAS [<A HREF="mailto:Caroline.Barretto@rdls.nestle.com">mailto:Caroline.Barretto@rdls.nestle.com</A>] </FONT>
<BR><FONT SIZE=2>Sent: 16 December 2003 16:12</FONT>
<BR><FONT SIZE=2>To: emboss@embnet.org</FONT>
<BR><FONT SIZE=2>Subject: [EMBOSS] Does seqret have limitations ?</FONT>
</P>
<BR>
<P><FONT SIZE=2>> Dear all,</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Did anybody notice that the seqret program seems to </FONT>
<BR><FONT SIZE=2>> be limited by the number of sequences to convert ? I </FONT>
<BR><FONT SIZE=2>> am trying to convert 1 file containing 23000 genbank </FONT>
<BR><FONT SIZE=2>> sequences into GCG format.</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Do you have a suggestion for that ?</FONT>
</P>
<P><FONT SIZE=2>Seqret is not limited by number of files. I routinely pass the whole of EMBL through seqret and it works fine. What error do you get when seqret stops? Could it just be that there is a malformed entry part way through your file?</FONT></P>
<P><FONT SIZE=2>Is the file you are trying to convert >2Gb in size? If so this could be the reason for the failure rather than seqret being limited by the number of sequences. In this case though I thought that the failure would happen when the file was first opened and not after a certain number of sequences had passed through.</FONT></P>
<P><FONT SIZE=2>If the problem is a large file you might be able to get round this by using a pipe to get information into seqret. Try</FONT>
</P>
<P><FONT SIZE=2>cat your_genbank_file.gb | seqret -filter -osf gcg > your_gcg_file.gcg</FONT>
</P>
<P><FONT SIZE=2>This should work as long as your OS version of cat and your shell can handle large files.</FONT>
</P>
<P><FONT SIZE=2>Hope this helps</FONT>
</P>
<P><FONT SIZE=2>Simon.</FONT>
</P>
</BODY>
</HTML>