[Bioperl-l] problem to fit genomic coordinates

Wed Mar 25 10:56:39 EDT 2009

this is my problem:
how is it possible to fit range of genomic coordinates stored in two 
distinct files ?

first file (file1.txt) is my annotation file with format as:

regulatory_region 3455 3463
regulatory_region 3535 3544
regulatory_region 3601 3608
transcriptional_cis_regulatory_region 3622 3630
five_prime_UTR 3631 3759
CDS 3760 3913
exon 3631 3913
CDS 3996 4276
exon 3996 4276
CDS 4486 4605
exon 4486 4605
CDS 4706 5095
exon 4706 5095
CDS 5174 5326
exon 5174 5326
....
....

second file (file2.txt) is my experimental file with format as:

acc_2765773 3222 3239 -
acc_2842543 3222 3239 -
acc_2842544 3222 3239 -
acc_442945 3222 3239 -
acc_442946 3222 3239 -
acc_4873 3222 3239 -
acc_53956 3222 3239 -
acc_562588 3222 3239 -
acc_807114 3222 3239 -
acc_84146 3222 3239 -
acc_2419732 3268 3285 +
acc_3041065 3565 3583 +
acc_362358 3640 3656 -
acc_3279485 3793 3813 +
acc_3091017 3794 3811 -
acc_2807380 3832 3848 +
acc_3105138 3832 3848 +
acc_3105139 3832 3848 +
acc_3105140 3832 3848 +
acc_3116450 3832 3848 +
acc_86708 3832 3848 +
acc_1987802 3922 3938 -
acc_1679660 4113 4129 +
acc_891489 4113 4129 +
acc_2829973 4299 4318 +
....
....

number of lines in file1.txt ~ 150000
number of lines in file2.txt ~ 800000

so, how to annotate my file2 using the genomic coordinates stored in 
file1. I need to compare each couple of range of my file2 with each 
couple of range of my file1: 800000x150000 combinaisons (quadratic 
analysis) ?
i'm looking for a fast method to do that, something like linear 
progression in the analysis

thank you so much if you have ideas for help me.

Laurent --