ABSTRACT
A novel medical diagnostic algorithm that emulates the reasoning of a clinician is presented. Accurate and efficient, it concludes only those final diagnoses that agree with the diseases that actually afflict a patient. A differential diagnosis list is created and the probability of each diagnosis is calculated with a novel procedure that we call MiniMax Procedure that uses positive predictive value of clinical data present to increase probability and sensitivity of clinical data absent to reduce probability. The probability of a diagnosis is considered equal to the maximum positive predictive value of all clinical data present that support the diagnosis, circumventing more complex and inaccurate prior art methods. A novel 3Step method reduces computation time complexity. The MiniMax Procedure also identifies concurrent diseases. Bayes formula, because of its inability to process properly interdependent clinical data and concurrent diseases, is used with modifications. The algorithm recommends at each diagnostic step, the best costbenefit clinical datum next to investigate. Furthermore, the algorithm can simultaneously recommend several best costbenefit clinical data, which is essential in emergencies. Interactions of drugs and concurrent diseases with clinical data of the primary disease are detected, precluding ruling out of serious diseases due to this masking effect. Overlooking of important diagnoses is precluded by safety checks for riskflagged clinical data and diagnoses, and search for diagnoses that may be causally related with each primary final diagnosis. The algorithm diagnoses clinical forms of disease and complex clinical presentations, where disease, syndromes, complications, and other clinical entities coexist in a single patient. The algorithm is straightforward, logical and mathematically simple; heuristic restrictions preclude excessive proliferation of clinical data and diagnoses. Because it is expressed in natural language, it is readily understandable and user friendly.
NOVEL CONCEPTS OF OUR ALGORITHM
Our diagnostic program includes novel ideas not mentioned in the extensive literature references reviewed. Some of the ideas are original in themselves; others are original based on a special manner in which extant elements are combined.
Disregarding disease prevalence (page 23) is doubly advantageous: (1) Prior probabilities of diseases (equivalent to prevalence) are eliminated from Bayes formula, which is transformed into a simplified equation 5 for calculating the PP value from statistically established S values of clinical data (page 26.) (2)Low prevalence no longer is a cause of excluding a rare disease from a differential diagnosis list, giving this disease a chance to become a final diagnosis, based on merit of supporting clinical data.
Disregarding subjective qualities of clinical data (page 22), which are variable and unreliable, remarkably simplifies diagnostic processing without losing accuracy.
We believe that the PP value best indicates how strongly a clinical datum supports a diagnosis and more accurately than does specificity, true positive value, estimated evoking strength [13][24], or any other index attached to a clinical datum; its value does not change unless the sensitivities of clinical data are changed (page 25.)
PP values can be calculated and included in the knowledge base before the diagnostic program is delivered to the user, avoiding real time calculation.
Considering the greatest of the PP values of clinical data present that support a diagnosis, equal to the P of this diagnosis would appear superior to arithmetically combining values of several redundant supportive clinical data, thereby inappropriately increasing the P of the diagnosis (page 39.)
Our minimax procedure (page 39) for determining the P of a diagnosis overcomes the deficiencies of the typical Bayes formula. In a novel way, the algorithm combines a modified weight averaging Bayes formula (page 44) with the minimax principle, to calculate the P of a diagnosis. Original Bayes formula processes sequentially or simultaneously multiple clinical data present and absent. Because these clinical data are interrelated, this application violates the independence and incompatibility principles (page 27), leading to inaccuracies of calculated P. We apply Bayes averaged formula only to a single clinical datum present and a single clinical datum absent, in each clinical data pair, which does not violate those principles, and we eliminate prior probability of diseases from it. Once the partial P conferred to the diagnoses by each of all possible clinical data pairs are calculated with the modified averaging Bayes formula, they are integrated into a total P by the minimax procedure. In this manner, we obtain the benefit of two worlds: original Bayes formula is inaccurate for multiple clinical data processing, but accurate for single pairs of one clinical datum present and one clinical datum absent. Minimax principle alone does not take into account the appropriate proportional significance of S that is considered by the weight averaged Bayes formula; however, it is appropriate to integrate the partial P values calculated by this formula into a total P of the diagnosis.
At each step of the diagnostic inquiry, we apply a novel method to select and recommend the bestcostbenefit clinical datum next to investigate (page 53), which heuristically reduces the number of new clinical data searched, thereby considerably shortening the diagnostic process. Unlike other diagnostic programs, our algorithm considers greatest PP value, S, and cost in selecting the best costbenefit clinical datum.
Cost has high priority in our program (page 62) and refers not only to the dollar price, but also to discomfort and risk; the maximum of these qualitative levels represents the overall cost (page 29.)
Simultaneously recommending several best costbenefit clinical data (page 63) provides a novel and important advantage over recommending only one at each diagnostic step.
Ability of our minimax procedure to distinguish competitive diagnoses from concurrent diagnoses (page 81.)
Conclusion of diagnostic quest is treated in a manner similar to that used by other authors. However, I have not found literature references to the creation of confirmation and deletion thresholds as we describe on page 87.
The entire diagnostic process is partitioned into two distinct steps (page 94): the first step uses probabilities to obtain final diagnoses; the second step uses categorical tools to preclude overlooking related clinical entities and to establish unrelated or related concurrency among diagnosed clinical entities, integrating them into complex clinical presentations. This partitioning eliminates the computational complexity and even impossibility of managing the entire diagnostic process with probabilistic calculations.
Risk identifiers, interaction identifiers, and other safety checks (page 89)to preclude overlooking important diagnosesas well as empirical treatment, diagnosis by exclusion, and deferred diagnosis are also important heuristic aspects that complete the benefits of our algorithm.
The knowledge base must be integrated with all known disease models, including sensitivity (PPvalues are automatically calculated) and cost category of each clinical datum, risk and interaction identifiers, empirical values for confirmation and deletion thresholds, and all known complex clinical presentation models. When clinical data present and absent are provided, the algorithm is expected to return accurate final diagnoses and complex clinical presentations in an almost automatic manner.
Our diagnostic program could be expanded to include prognostic and therapeutic guidelines. A more ambitious project might be to determine whether other inexact disciplines such as law, sociology, politics, defense, or corporate strategy could benefit from some steps of our algorithm.
