Select your font size 
 
about us products & services consulting & support news & events contact us
To make it clear how Bayes theorem works, you will develop an online

Medical diagnosis wizard - SK

print this article 
 

To make it clear how Bayes theorem works, you will develop an online medical diagnosis wizard using PHP. This wizard could also have been called a calculator except that it takes four input steps to supply the prerequisite information then a step to review the result.

The wizard works by asking the user to supply the various pieces of information critical to computing the full posterior probability. The user can examine the posterior distribution to determine which which disease hypothesis enjoys the highest probability based on:

  1. The diagnositic test information
  2. The sample data used to estimate the prior and likelihood distributions

Bayes Wizard: Step 1

Step 1 in using Bayes theorem to make a medical diagnosis involves specifying the number of disease alternatives that you will examine along with the number of symptoms or evidence keys. In the generic example you will look at, you will evaluate three disease alternatives based on evidence from two diagnostic tests. Each diagnostic test can only produce a positive or negative result. This means that the total number of symptom combinations, or evidence keys, you can observe is four (++, +-, -+, or --).

Figure 3. Form to enter disease hypotheses and symptom possibilities
Form to enter  disease hypotheses and symptom possibilities

Bayes Wizard: Step 2

Step 2 involves entering the disease and symptom labels. In this case, you are just going to enter d1, d2, and d3 for the disease labels and ++, +-, -+ and -- for the symptom labels. The two symbols used for symptom labels signify whether the results of the two diagnostic tests came out positive or negative.

Figure 4. Form to enter disease and symptom labels
Form to enter disease and symptom labels

Bayes Wizard: Step 3

Step 3 involves entering the prior probabilities for each disease. You will use the data table below to determine the prior probabilities to enter for step three and the likelihood to enter for step four (this data table originally appeared in Introduction to Probability). Using this example allows you to confirm that the final result you obtain from the wizard agrees with the results you can find in this book.

Figure 5. Joint frequency of diseases and symptoms
Joint frequency of diseases and symptoms

The prior probability of each disease refers to the number of patients diagnosed with each disease divided by the total number of diagnosed cases in this sample. The relevant prior probabilities for each disease are entered in the following:

Figure 6. Form to enter disease priors
Form to enter disease priors

You do not have to rely upon a data table such as the previous one to derive the prior probability estimates. In some cases, you can derive prior probabilities by using common-sense reasoning: The prior probability of a fair two-sided coin coming up heads is 0.5. The prior probability of selecting a queen of hearts from a randomized deck of cards is 1/52.

You also commonly run into situations where you intially have no good estimates of what the prior probability of each hypothesis might be. In such cases, it is common to posit noninformative priors. If you have four hypothesis alternatives, then the noninformative prior distribution would be 1/4 or 0.25 for each hypothesis. You might note here that Bayesians often criticize the use of a null hypothesis in significance testing because it amounts to assuming noninformative priors in cases where positing informative priors might be more theoretically or empirically justified.

A final way to derive estimates of the prior probability of each hypothesis P(Hi) is through a subjective estimate of what those probabilities might be given everything you have learned about the way the world works up to that point P( H=h | Everything you know). You will often find Bayesian inference sharing the same bed with a subjective view of probability in which the probability of a proposition is equated with one's subjective degree of belief in the proposition.

What it important in this discussion is that Bayesian inference is a flexible technique that allows you to estimate prior probabilities using objective methods, common-sense logical methods, and subjective methods. When using subjective methods, you must still be willing to defend your prior probability estimates. You may use objective data to help set and justify your subjective estimates which means that Bayesian inference is not necessarily in conflict with more objectively oriented approaches to statistical inference.

Bayes Wizard: Step 4

The data table provides you with information you can use to compute the probability of the symptoms (like test results) given the disease, also known as the likelihood distribution P(E | H).

To see how the likelihood values entered below were computed, you can unpack P(E|H) using the frequency format for computing conditional probabilities:

P(E | H) = {E & H} / {H}

This tells us that you need to divide a joint frequency count {E & H} by a marginal frequency count {H} to obtain the likelihood value for each cell in your likelihood matrix. The top left cell of your likelihood matrix P(E='++' | H='d1) can be immediately computed from the joint and marginal frequency counts appearing in the data table:

P(E='++' | H='d1) = 2110 / 3125 = .6562

All the likelihood values entered in Step 4 were computed in this manner.

Figure 7. Form to enter likelihood of symptoms given the disease
Form to enter likelihood of symptoms given the disease

It should be noted that many statisticians use likelihood as a system of inference instead of, or in addition to, Bayesian inference. This is because likelihoods also provide a metric one can use to evaluate the relative degree of support for several hypotheses given the data.

In the previous example, you can see that the probability of a particular evidence key varies for each hypothesis under consideration. The probability of the ++ evidence key is the greatest for the d1 hypothesis. You can assess which hypothesis is best supported by the data by:

  1. Examining the likelihood of the evidence key given each hypothesis key
  2. Selecting the hypothesis that maximizes the likelihood of the evidence key

Doing so would be an example of inference according to the principle of maximum likelihood.

Another interesting point to note is that all the values in the above likelihood distibution sum to a value greater than 1. What this means is that the likelihood distribution is not really a probability distribution because it lacks the defining property that the distribution of values sum to 1. This summation property is not essential for the purposes of evaluating the relative support for different hypotheses. What is important for this purpose is that the "likelihood supplies a natural order of preference among the possibilities under consideration" (from R.A. Fisher's Statistical Methods and Scientific Inference, p. 68).

You may not understand fully the concept of likelihood from this brief discussion, but I do hope that you appreciate its importance to the overall Bayes theorem calculation and its importance as the foundation for another system of inference. The likelihood system of inference is preferred by many statisticians because you don't have to resort to the dubious practice of trying to estimate the prior probability of each hypothesis.

Maximum likelihood estimators also have many desirable mathematical properties that make them nice to work with (the properties include transitivity, additivity, a lack of bias, and invariance under transformations, among others). For these reasons, it is often a good idea to closely examine your likelihood distribution in addition to your posterior distibution when making inferences from your data.

Bayes Wizard: Step 5

The final step of the process involves displaying the posterior distribution of the diseases given the symptoms P(H | E):

Figure 8. Probability of each disease given symptoms
Probability of each disease given symptoms

The section of the script that was used to compute and display the posterior distribution looks like this:

Listing 4. Computing and displaying the posterior distribution
<?php
include "Bayes.php";

$disease_labels = $_POST["disease_labels"];
$symptom_labels = $_POST["symptom_labels"];
$priors         = $_POST["priors"];
$likelihoods    = $_POST["likelihoods"];

$bayes = new Bayes($priors, $likelihoods);
$bayes->getPosterior();
$bayes->setRowLabels($symptom_labels);    // aka evidence labels
$bayes->setColumnLabels($disease_labels); // aka hypothesis labels
$bayes->toHTML();
?>

You begin by loading the Bayes constructor with the priors and likelihoods obtained from previous wizard steps. Using this information, you compute the posterior using the $bayes->getPosterior() method. To output the posterior distribution to the browser, you first set the row and column labels to display, then output the posterior distribution using the $bayes->toHTML() method.



Page:   1  2  3  4  5  6  7  8  9  10  11 Next Page: Implementing the calculation with Bayes.php

The content shown in this page was first published by IBM developerWorks and is reprinted with permission from Paul Meagher (www.datavore.com)


Most Recent Website and Regional Updates

 High Scalability - Large Systems Optimization
Transparen Corporation lends its expertise to clients experiencing rapid and sudden growth in traffic or server utilization, bottlenecks, systems instability, downtime during peak traffic, or which would like to plan to avoid such issues.

 
 Throughput (or Bandwidth) vs. Latency
This document uses the example of Bill Gates purchasing Google to explain the difference between bandwidth (or throughput) and latency.

 
 Avoidance of Magic - Informal Survey Results
Joe the IT Director phones up high-traffic websites to ask them if they used magic.

 
 Don't go take a walk in Saskatoon!
There's been a rash of pedestrians hit by autos in Saskatoon this year.  Rapid economic and population growth has resulted in an overloaded infrastructure.  Attempts to upgrade this infrastructure have closed down...read more

 
 Self checkout arrives at the grocery store!
Last night I went to my preferred grocery store for my shopping. On the way in I noticed a new thing.  At first I thought they'd replaced all the checkouts, then I saw it was just a few checkouts were brand new.  I...read more

 
 Canadian Fireball Remains Found
"University of Calgary researchers reported last Friday that they have found the remains of the meteor that streaked through Canadian skies in November The remains of the 10-ton meteor were densely strewn over a...read more

 
 Scientists Find Canadian Meteorite
Planetary scientist Alan Hildebrand and graduate student Ellen Milley from the University of Calgary discovered fragments of the ten ton meteorite that fell near the Alberta-Saskatchewan border on the evening of...read more

 
 UPDATED: Liberal Video Attacks Conservative Deficit, Canada Perilously Close to Budget Deficit, No Stimulus, No End in Sight
Update: " The Conservatives pledged during the election campaign never to run a deficit, but Flaherty said the sudden decline in the global and domestic economies has brought Ottawa perilously close to a budget...read more

 
 Meteorite Hunter Offers $10,000 Reward for Canadian Meteorite
A bright meteor streaked across the skies of Saskatchewan and Alberta, Canada on November 20, 2008 at approximately 5:26 PM MST, prompting telephone calls to police stations, NORAD, and news stations from North...read more

 
 Massive Fireball Falls Over Alberta and Saskatchewan, Canada
A bright meteor streaked across the skies of Saskatchewan and Alberta, Canada on November 20, 2008 at approximately 5:26 PM MST, prompting telephone calls to police stations, NORAD, and news...read more

 
 03/12/2008: Crime in Canada
The story of a long-standing family feud and what it says about the future of violent crime in Canada.

 
 02/12/2008: The Constitution and a Coalition Government
What does the Constitution say about coalitions? Today on the podcast, we'll conjure up the ghost of Mackenzie King to find the answers.

 
 01/12/2008: The Big Three & the Future of the Auto Industry
One way or another, the Big Three automakers will have a huge impact on Windsor's future. But the future of those companies is being decided by forces well beyond this city's borders.

 
 28/11/2008: Greenpeace and the DRC
For more than a decade, the Democratic Republic of Congo has seen one humanitarian disaster after another. But there's an environmental catastrophe as well. And Greenpeace thinks it deserves our attention too.

 
 27/11/2008: The Agony of Stephen Harper
Stephen Harper got into politics to make government smaller. Now he's facing an economic crisis that seems to cry out for big time public intervention.

 

Google
 
Web transparen.com

Contact Information

Related Information

 
  Saskatoon
Regina
Prince Albert
Moose Jaw
Yorkton
Swift Current
North Battleford
Estevan
Weyburn
Corman Park
 
 
E C M | © 2003-2007 Transparen Corp.      

Standardized Services: Data Recovery Service / Creative Services / Premium Web Hosting Services / System Administration Tech Support Services
Recent Projects: Full-Service Mortgage and Financing Company / System to manage flights from Vancouver to Tofino / Photo exchange verification service
Our Vancouver BC Server Proudly Hosts: automated parking and revenue control systems, leafside lane at southlands, cost effective alternative power sources, Higher Grade Learning Centres, pacific forage bag supply, sunburst medical, neosonic design, roger mahler photography - passionate, intriguing, desirable, the connection between east and west, affordable flights to victoria and tofino, low interest mortgage brokers in vancouver, richmond, surrey, toronto, Toronto Calgary and Vancouver IT staffing and talent search
Saskatoon, Regina, Prince Albert, Moose Jaw, Yorkton, Swift Current, North Battleford, Estevan, Weyburn, Corman Park