Richard
03-14-2008, 02:20 PM
richard, have you really read the genetic markers that has been for some citrus cultivars? There are genetic samples that delineate some characteristics of a particular sport mutation. The work can be simplified if you know where to look. Many genetic databases are not the same, thus there is still a lot of work to be done.
I have been part of the mathematics team on some of this work, and in the case of citrus very recently. The context I'm addressing is when you just have rRNA and no auxilary biology such as cells, plant material; i.e., no phenotype information.
When taking the dictionary approach - which is all there currently exists of any practical value, there are two different statistical measures to be performed: (a) what data on file does it match, and with what confidence level, and (b) what data on file does not match, and with what confidence level.
For unknown sample submitted, you hope for one of the following to occur:
(1) it matches only one entry and with high confidence, and it does not match the rest (also with high confidence)
(2) it does not match any entries and with high confidence.
Unfortunately, many times one of these two undesirable results are obtained:
(3) it matches more than one entry with the same probability and high confidence and it does not match the rest (also with high confidence)
(4) it matches one or more entry with high confidence but also does not match those very same entries (also with high confidence).
These last two cases can sometimes be resolved by bringing phenotype information into the analysis. There are biological experts who insist that this is a reflection of incomplete genetic information, while others who insist (at least for most species) there will never be enough. In the words of Riddick: its "not my fight". I'm simply try to be practical with the existing state of knowledge.
The goal of this work is to produce a "black box", in which you insert a tiny amount biological material and receive identification.
I have been part of the mathematics team on some of this work, and in the case of citrus very recently. The context I'm addressing is when you just have rRNA and no auxilary biology such as cells, plant material; i.e., no phenotype information.
When taking the dictionary approach - which is all there currently exists of any practical value, there are two different statistical measures to be performed: (a) what data on file does it match, and with what confidence level, and (b) what data on file does not match, and with what confidence level.
For unknown sample submitted, you hope for one of the following to occur:
(1) it matches only one entry and with high confidence, and it does not match the rest (also with high confidence)
(2) it does not match any entries and with high confidence.
Unfortunately, many times one of these two undesirable results are obtained:
(3) it matches more than one entry with the same probability and high confidence and it does not match the rest (also with high confidence)
(4) it matches one or more entry with high confidence but also does not match those very same entries (also with high confidence).
These last two cases can sometimes be resolved by bringing phenotype information into the analysis. There are biological experts who insist that this is a reflection of incomplete genetic information, while others who insist (at least for most species) there will never be enough. In the words of Riddick: its "not my fight". I'm simply try to be practical with the existing state of knowledge.
The goal of this work is to produce a "black box", in which you insert a tiny amount biological material and receive identification.