[Biojava-l] particular region of genomic sequence
Matthew Pocock
matthew_pocock@yahoo.co.uk
Mon, 22 Apr 2002 11:30:13 +0100
hz5@njit.edu wrote:
> Hi,
> Anybody can give a hint how to use biojava extract a specific region(say: -800
> to +200 relative to transcription startsite) of a gene's genomic sequence from NCBI?
>
> I wrote java program to do this myself, but I am not if my parsing scheme and
> retrieving scheme are efficient and accurate.
>
> Thanks!
>
Morning,
If you have a genbank file with this region (both the tss and -800 -
+200 relative to that) then you can use SeqIOTools.readGenbank to read
the file, the filter() method on sequence in combination with an
instance of FeatureFilter (by location, by type or whatever you need to
pull out that tss), and then new SubSequence(seq, tssLoc.getMin() - 800,
tssLoc.getMax() + 200) to cut out that bit of sequence. You may need to
check the strandedness of the tss and flip the subsequence accordingly.
Matthew