[Bioperl-l] validating a sequence

Andreas Matern andreas.matern@lbri.lionbioscience.com
Mon, 25 Feb 2002 14:50:12 -0500


Forgive me if this answer occurs somewhere else, but. . .

I need to validate FASTA sequences. The web interface (another
developer, can't touch his code) allows users to cut and paste, and many
of them cut and paste sequences with numbers in them 

(i.e.
>mysequence
1ACACGATCGACTGACATCGTCAGTACGTCGATACGATCGACTGACTAGCTC
51AACTCGTCGTCGTCGTCGCTGCTCGTCGCTGCTCGTCTGCTCGTCGTC

etc.)

The FASTA file is turned into a Bio::Index::Fasta by a cron job
And then I (normally) run

@ids = $inx->get_all_primary_ids();
foreach $id (@ids) {
	my $seq = $inx->getch($id);
	....do stuff with seq....
	....connect to database...
	....etc....
}

This of course dies when the $seq is screwey (

MSG: Attempting to set the sequence to [1ACA....] which does not look
healthy

I see the  $seq->validate_seq, but I'm not sure how to use it in my
context

Any suggestions, especially for stripping out non-IUPAC characters from
a FASTA string, would be greatly appreciated...

-Andreas

-- 


------------------
Andreas Matern
Bioinformatician
LION Bioscience Research, Inc.
141 Portland Street, 10th Floor
Cambridge, MA 02139

andreas.matern@lbri.lionbioscience.com
phone: (617) 245-5483
fax:   (617) 245-5499