[Biojava-dev] Accession defaults for GenbankFormat
Bubba Puryear
bubba.puryear at gmail.com
Mon Jul 3 15:40:40 UTC 2006
Hey all,
I'm using biojava for an internal app for my client that has about 5000
internally developed genbank records. The majority of these records do not
have ACCESSION fields, since they didn't come from a public data source.
(Many of these were created using Invitrogen's Vector NTI and saved as
files)
Because there is no accession number for these records, I get problems
when I try to use RichSequence and friends with this data. I've made a patch
for GenbankFormat.java that sets the accession to the locus name of the
record during parsing. If/When the accession field is parsed, this value is
over written, so I think it should be ok generally. I also have a test case
and test data file.
The registration page thing discouraged attachments for this list -- how
should I provide these files? Thanks in advance,
Bubba
ps - The patch is small, I can inline it here:
Index: src/org/biojavax/bio/seq/io/GenbankFormat.java
===================================================================
RCS file:
/home/repository/biojava/biojava-live/src/org/biojavax/bio/seq/io/GenbankFormat.java,v
retrieving revision 1.63
diff -u -r1.63 GenbankFormat.java
--- src/org/biojavax/bio/seq/io/GenbankFormat.java 28 Jun 2006 17:02:47
-0000 1.63
+++ src/org/biojavax/bio/seq/io/GenbankFormat.java 1 Jul 2006 20:34:48
-0000
@@ -274,6 +274,9 @@
Matcher m = lp.matcher(loc);
if (m.matches()) {
rlistener.setName(m.group(1));
+ // default accession to locus name for sources that do
not have accessions proper.
+ accession = m.group(1);
+ rlistener.setAccession(accession);
rlistener.setDivision(m.group(5));
rlistener.addSequenceProperty(Terms.getMolTypeTerm(),
m.group(3));
rlistener.addSequenceProperty(Terms.getDateUpdatedTerm
(),m.group(6));
More information about the biojava-dev
mailing list