Dr House, MD – ICD10 diagnosis and a case of F42.2

Differential diagnosis procedures are hard and painful. As someone who has been through them I would say that the show “House, MD” is the most accurate from the patient’s point of view. 

On the doctor side, the show is, as all shows concerning tv-doctors are, over the top. But that is what keeps us watching.

I have for a long time now had a love for diagnostics; the challenge  the grey zones, the chase, the prize. I’ve studied the ICD10 as if it was a beautiful song. And trust me, it is far from logic in many respects, but my obsessive thoughts and acts (code F42.2) finds the non-logical logic and the universal medical language that the ICD10 is, intriguing. So when Dr House, MD comes on I watch with excitement (ignoring the obvious clinical fallacies) and I compete with Gregory for the final diagnosis. And I almost always beat him, which has made my husband stop watching the show. I’ve annoyed him with my constant spoilers. So I shared a formula on twitter, as I perceived it.

But being a scientist, and a lover of the ICD10 (however incomplete or overwhelming it may feel at times) I had to test my theory. And as my friends on twitter pointed out, some diseases I had not accounted for, and even worse, after some research I found out that nobody had done a compilation of frequency of diagnosis. So I took on the task. 


I found a site disclosing all diagnosis featured in the show, episode by episode here. I printed the results and made my own evaluation of category by heart, to train my own ability. Then I went to ICD10 online and noted of the category A-Z the disease belonged to. 

Then I placed all the categories in X-Cell (STATA, R, SPSS felt a bit redundant). I used the features of X-Cell to count the prevalence and make the table. 

Non-specific diagnosis e.g. ectopy, without specification of organ affected were removed from analyses (n = 48) 

ICD10 codes C00-D99 were merged together, however all diagnostic labels were kept in chart text. 

Approximated total  number of diagnosis accounted in the document (n = 284)


I have to reject my hypothesis. Neurology was not favored but parasites were…

Neurological diseases were only accountable for 9% of the diagnosis

Oncology for 8,5%,

Endocrine 7,6%,

Parasitic viral 29,6%

Genetic 1,2%

The results show that the show writers are particularly interested in parasitic and viral diseases.  My own by hand diagnostic accuracy was almost 90% in correspondence with the actual ICD10  (it’s besides the point, but since it is my only party trick I have to at least mention it). 

Click on the picture for enlargement. 

Chart 1: Prevalence (%) of diagnosis throughout the show.  Diagnosis (N = 236)


The reliability of the original data may not cover all of the diseases but does provide a solid source and at least points to a trend in the diagnostics. Though Dr House is an American, and hence the ICD9 would be used at his office, due to convenience I have chosen to classify the diagnosis according to the ICD10, which may enhance and attenuate the prevalence of certain diseases. The diagnosis removed were predominantly associated with either infectious, dermatological or respiratory codes, hence if included these might find more statistical power. 

So in light of the findings, the prevalence of the diagnosis tells us that parasites and viruses are things that interest us most, or are easiest to make scripts out of. 


Scientist with obsessive thoughts, too much time spent on watching House, MD and working with ICD10 criteria, are bound to make statistics out of any data available.

Future directions

Next step is to evaluate to what extent the prevalence of Dr House’s diagnosis correspond with real life ones. 

Author: Almira Osmanovic Thunström almira@almiraosmanovicthunstrom.com

ICD10 Diagnosis (The entire F-section, minus non-organic psychosis)

Appendix: Yes I did it by hand.



  1. I love this article and the spirit in which it was written. The project you describe sounds like something I might have done… just never thought of it. Fun!… But I do have some questions… (I am deeply and passionately immersed in the conversion to ICD-10 in the States…) Did you graph only final diagnosis or include differentials in your graph? (I don’t know how many episodes there have been.) And your analysis?

    I don’t know what this sentence means: “The diagnosis removed were predominantly associated with either infectious, dermatological or respiratory codes, hence if included these might find more statistical power. ” Can you explain or put this in some context for me?

    Yes, my brain brought me here – and any or all of the tag words would have worked too. Love your blog!!

    1. Hi! Thanks for the wonderful comment!

      -Did you graph only final diagnosis or include differentials in your graph?
      I only included the final diagnoses, only those disclosed on Wikipedia and I think the last two seasons are not included. Have to redo the analyses now that the last episode is done :).

      “The diagnosis removed were predominantly associated with either infectious, dermatological or respiratory codes, hence if included these might find more statistical power. ”

      – ICD-9 has just over 14,000 diagnosis codes and almost 4,000 procedural codes. In contrast, ICD-10 contains over 68,000 diagnosis codes (clinical modification codes) and over 72,000 procedural codes. The major differences between ICD9 and ICD10 are in infectious, dermatological, respiratory codes (as far as I’ve understood, perhaps even more areas as I’m not fully familiar with version 9). When a doctor uses ICD9 and another uses ICD10 the statistics may differ as some diagnoses may not be quantifiable hence my coding may over or under sample some code groups. As I also removed non-specific codes which were related to those organs which differ in ICD9 and ICD10 this may be of particular relevance to note.

      Follow-up of this data should be; compare data from the series to the US population and redo analyses using both 9 and 10 and then compare! The comparison should be fun for many different reasons, particularly to show how much the prevalence may differ just by switching the version. Will be fun to see how it maps out. And maybe a good example to show why the 10 should be implemented everywhere.

      1. Thanks for the additional info!! Some people love jigsaw puzzles, some crossword puzzles… this is my kind of fun!
        In my epi research I worked extensively with US and international data including ICD9, ICD10, and the ICD10 (in the US mortality). My commitment to accurate, reliable data drew me to the ICD-10 conversion project in the US, and I am now extremely well versed in the mapping/conversion process between the two manuals – “forward and backward” as is the lingo here.
        I would be more than glad to fit dx on the ICD-9 side and map to the ICD-10 – the conversion tool I use has been developed by experts and the mappings I have approved, documented w/ peer reviewed articles and/or common medical practice in major teaching institutes in US (ie Mayo, Johns Hopkins, etc.)
        I think it sounds like a fun project!
        I think demo’ing the difference would be very keen in both of our worlds – a nice collaborative effort. I do like efficiency and hate to recreate a wheel if I do not have to do so.

      2. Awsome! That would be great. Fun collaboration! my personal website is http://www.almiraosmanovicthunstrom.com and there you can find my email. Mail me (so I have your mail as well) and we can discuss how to share files :).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: