[Bioperl-l] fasta header replace
Frank Schwach
fs5 at sanger.ac.uk
Mon Aug 30 15:11:06 UTC 2010
Hi Olivier,
Do you know how to read a file and build a hash from the contents? This
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B
then you can do something like:
open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
$well2strain{$well}=$strain;
}
You can then use the values of the hash to set the sequence ID as you
parse the FASTA file. The BioPerl SeqIO howto gives details about how to
read and write the FASTA file
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');
See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.
Hope that helps to get you started.
Frank
odclerck wrote:
> Hi,
> Was wondering if someone had an easy script available that converts the
> headers of a fasta sequences to a value stored in a separate text file.
>
> Macrogen produces files with sequences that look more or less like this:
>
>> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.
>>
>
> I can filter out the position on the plate e.g. "A1" easily but would like
> to replace this with the name of the strain stored in a different text file,
> e.g. "A1_D1222".
>
> Realize this sounds pretty basic to most of you, but I'm pretty new at
> scripting.
> Olivier
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list