[Bioperl-l] problem to fit genomic coordinates

Wed Mar 25 11:23:24 EDT 2009

Read in first file and create a Bio::SimpleAlign object

Then use the slice method to find the features that are between the
start/end values of your second file

=head2 slice

 Title     : slice
 Usage     : $aln2 = $aln->slice(20,30)
 Function  : Creates a slice from the alignment inclusive of start and
             end columns, and the first column in the alignment is
denoted 1.
             Sequences with no residues in the slice are excluded from
the
             new alignment and a warning is printed. Slice beyond the
length of
             the sequence does not do padding.
 Returns   : A Bio::SimpleAlign object
 Args      : Positive integer for start column, positive integer for end
column,
             optional boolean which if true will keep gap-only columns
in the newly
             created slice. Example:

             $aln2 = $aln->slice(20,30,1)

=cut 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Laurent MANCHON
> Sent: Wednesday, March 25, 2009 7:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] problem to fit genomic coordinates
> 
> this is my problem:
> how is it possible to fit range of genomic coordinates stored in two 
> distinct files ?
> 
> first file (file1.txt) is my annotation file with format as:
> 
> regulatory_region 3455 3463
> regulatory_region 3535 3544
> regulatory_region 3601 3608
> transcriptional_cis_regulatory_region 3622 3630
> five_prime_UTR 3631 3759
> CDS 3760 3913
> exon 3631 3913
> CDS 3996 4276
> exon 3996 4276
> CDS 4486 4605
> exon 4486 4605
> CDS 4706 5095
> exon 4706 5095
> CDS 5174 5326
> exon 5174 5326
> ....
> ....
> 
> second file (file2.txt) is my experimental file with format as:
> 
> acc_2765773 3222 3239 -
> acc_2842543 3222 3239 -
> acc_2842544 3222 3239 -
> acc_442945 3222 3239 -
> acc_442946 3222 3239 -
> acc_4873 3222 3239 -
> acc_53956 3222 3239 -
> acc_562588 3222 3239 -
> acc_807114 3222 3239 -
> acc_84146 3222 3239 -
> acc_2419732 3268 3285 +
> acc_3041065 3565 3583 +
> acc_362358 3640 3656 -
> acc_3279485 3793 3813 +
> acc_3091017 3794 3811 -
> acc_2807380 3832 3848 +
> acc_3105138 3832 3848 +
> acc_3105139 3832 3848 +
> acc_3105140 3832 3848 +
> acc_3116450 3832 3848 +
> acc_86708 3832 3848 +
> acc_1987802 3922 3938 -
> acc_1679660 4113 4129 +
> acc_891489 4113 4129 +
> acc_2829973 4299 4318 +
> ....
> ....
> 
> 
> number of lines in file1.txt ~ 150000
> number of lines in file2.txt ~ 800000
> 
> so, how to annotate my file2 using the genomic coordinates stored in 
> file1. I need to compare each couple of range of my file2 with each 
> couple of range of my file1: 800000x150000 combinaisons (quadratic 
> analysis) ?
> i'm looking for a fast method to do that, something like linear 
> progression in the analysis
> 
> thank you so much if you have ideas for help me.
> 
> Laurent --
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>