[Biopython] Retrieving fasta seqs
Alvaro F Pena Perea
alvin at pasteur.edu.uy
Mon Feb 1 16:16:39 UTC 2010
Hi all! This time the issue is about retrieving fasta records. I have a huge
multifasta file and another file that has a list of ids.
The latter has several ids, ex:
FBgn0010441
FBgn0011598
FBgn0011761
The purpose of this script is to retrieve the fasta sequences for this ids
from the multifasta file and save the data to a file.
Ex. output file
>FBgn0010441
ACTAGACCC
>FBgn0011598
GGTAATAAA
I tried to make it but I do not know how to retrieve the sequences from the
multifasta file
import sys
from Bio import SeqIO
try:
sec = open(sys.argv[1], 'r')
lista = open(sys.argv[2], 'r')
except:
print "Error"
listita = []
sec = [linea.id for linea in SeqIO.parse(sec,"fasta")]
for lines in lista:
line = lines.rstrip()
listita.append(line)
for i in xrange(len(listita)):
if listita[i] in sec:
print "I find it"
#Retrieve seqs
else:
print "Is not here"
More information about the Biopython
mailing list