[Bioperl-l] from SimpleAlign to SAM/BAM

Mark A. Jensen maj at fortinbras.us
Wed May 19 17:30:25 UTC 2010


Yes that's right John; B:T:R:Samtools is used within the B:A:.I:sam to do the 
write out with samtools command line pgms. Interested parties might look at 
Bio::Asssembly::IO::sam to see how Lincoln's Bio::DB::Sam (which uses the libbam 
library directly via XS, also not BioPerl proper but we love it anyway) might be 
employed.

----- Original Message ----- 
From: "John Marshall" <john.marshall at sanger.ac.uk>
To: <bioperl-l at bioperl.org>
Cc: "Albert Vilella" <avilella at gmail.com>
Sent: Wednesday, May 19, 2010 12:22 PM
Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM


> On 19 May 2010, at 14:34, Mark A. Jensen wrote:
>> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates  the use 
>> of Bio::Assembly::IO::sam (I think).
>
> I've only briefly skimmed the B:T:R:Samtools documentation, but it  would 
> appear that this mostly encapsulates running the various  samtools 
> subcommands.  These provide various manipulations on SAM and  BAM files, but 
> don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM.
>
>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>> >
>>> Considering I've got a way to map the cDNAs to chromosome  coordinates,
>>> how can I generate a SAM/BAM file with ~1,000,000 entries against  ~23.000 
>>> human
>>> coordinates?
>
> Perhaps I misunderstand, but if you already have a bunch of snippets  of 
> sequence and their mapped coordinates, then the easy way to  generate a SAM 
> file containing them is just to print it out by hand.
>
> A SAM file is just a tab-separated text file.  For each sequence in  your 
> Bio::SimpleAlign objects, print out a line containing appropriate  values for 
> each of the 11 main SAM fields.  (If the snippets are  effectively unpaired, 
> then MRNM,MPOS,ISIZE can just be *,0,0, and the  only FLAG values you'll be 
> choosing between are 0, 4, 16, and 20.)
>
> You should also start the file with an @SQ header for each of the  chromosomes 
> you've mapped against.
>
> (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf --  it's a 
> little vague, but should be more than enough to explain how to  e.g. print out 
> a basic SAM file with only the main fields.)
>
> Once you've printed out a simple SAM file, you can use B:T:R:Samtools  or 
> samtools directly or other tools to convert it to the binary BAM  format 
> and/or otherwise work with it.
>
> Cheers,
>
>     John
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a 
> charity registered in England with number 1021457 and a company registered in 
> England with number 2742969, whose registered office is 215 Euston Road, 
> London, NW1 2BE. _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 




More information about the Bioperl-l mailing list