[Bioperl-l] assembling chromosomes from contigs and .agp file

Smithies, Russell Russell.Smithies at agresearch.co.nz
Thu Jan 8 03:59:08 UTC 2009


Was easier than I thought although I couldn't work out a way to "build" a Bio::Seq directly from bits.
Here's how I did it:
-------------------------------

use Bio::DB::Fasta;
use Bio::Seq;
use Bio::SeqIO;

open(AGP,"Mt2.0_pgp.agp") or die $!;
my @chr = ();

my $db = Bio::DB::Fasta->new("contigs.fa");

while(<AGP>){
	chomp;
	split /\s/;
	# extend temp string if it's too short
	do{$chr[$_[0]] .= ' ' x 1_000_000;}while length $chr[$_[0]] < $_[2] ;
	if($_[4] !~ m/N/){
		($start,$stop) = $_[8] eq '+'?($_[6], $_[7]):($_[7], $_[6]);
		$s = substr $chr[$_[0]], $_[1], $_[9], $db->seq($_[5],$start,$stop);
	}else{
		$s = substr $chr[$_[0]], $_[1], $_[5], "N" x $_[5] ;
	}
}

#remove any trailing whitespace
@chr = map{s/\s+//g;$_}@chr;

#print the sequence. chromosomes are chr0 -> chr8
foreach(0..$#chr){
    my $seqobj = Bio::Seq->new( -display_id => "chr$_", -seq => $chr[$_]);
    my $seq_out = Bio::SeqIO->new('-file' => ">chr$_.fa",'-format' => 'fasta');
    $seq_out->write_seq($seqobj);
}
-------------------------------

Please excuse my hacky use of substrings but this .agp file had overlapping runs of 'N' and this was the easiest way to deal with it
e.g. 
0	1	50000	1	N	50000	clone	yes
0	50001	167645	2	F	AC144644.3	1	117645	+	117645
0	167646	217645	3	N	50000	clone	yes		
0	217646	317645	4	N	100000	contig	no		
0	317646	367645	5	N	50000	clone	yes		
0	367646	411754	6	F	AC146805.17	1	44109	+	44109

--Russell



> -----Original Message-----
> From: Smithies, Russell
> Sent: Thursday, 8 January 2009 1:33 p.m.
> To: 'bioperl-l at lists.open-bio.org'
> Subject: assembling chromosomes from contigs and .agp file
> 
> Does anyone have a script for building chromosomes from an .agp file
> and a directory full of contigs?
> If not, I'll write something but I didn't want to re-invent the wheel
> if there's something "in the wild".
> 
> Would something like a Bio::Assembly::IO::agp.pm be a good idea? Could
> an .agp file be regarded as a Bio::Assembly?
> 
> --Russell
> 

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list