[Biopython] Retrieving fasta seqs

Alvaro F Pena Perea alvin at pasteur.edu.uy
Mon Feb 1 16:16:39 UTC 2010


Hi all! This time the issue is about retrieving fasta records. I have a huge
multifasta file and another file that has a list of ids.
The latter has several ids, ex:
FBgn0010441
FBgn0011598
FBgn0011761
The purpose of this script is to retrieve the fasta sequences for this ids
from the multifasta file and save the data to a file.
 Ex. output file

>FBgn0010441
ACTAGACCC
>FBgn0011598
GGTAATAAA

I tried to make it but I do not know how to retrieve the sequences from the
multifasta file

import sys
from Bio import SeqIO
try:
sec = open(sys.argv[1], 'r')
lista = open(sys.argv[2], 'r')
except:
print "Error"

listita = []

sec = [linea.id for linea in SeqIO.parse(sec,"fasta")]

for lines in lista:
line = lines.rstrip()
listita.append(line)

for i in xrange(len(listita)):
if listita[i] in sec:
print "I find it"
                #Retrieve seqs
else:
print "Is not here"



More information about the Biopython mailing list