top of page

Healthcare Analysis (SQL)

  • Writer: Brandon Hopkins
    Brandon Hopkins
  • Jan 18, 2023
  • 5 min read

Updated: Jan 20, 2023

Introduction


The idea for this project was to imagine that I have just been hired by a local hospital as a healthcare data analyst. The hospital itself is very busy (as they all are) and my direct supervisor is swamped and has given me a list of tasks to complete. As a data analyst, I have access to the hospital data and decided to complete my analysis using SQL.


Tasks to Complete:

  1. Visualize the distribution of time spent in the hospital.

  2. Provide a list of all medical specialties and identify which specialties on average are performing the highest number of procedures.

  3. Determine if there may be any bias in how the hospital is treating patients based on race, gender, or age, specifically when looking at the number of lab procedures done.

  4. Provide insight into how the number of procedures a patient receives might correlate to the number of days spent in the hospital.

  5. Ad-hoc patient list request

  6. Provide a list of instances when patients came to the hospital with an emergency but ended up staying less than the average hospital stay.

  7. Provide a written summary for the 50 patients with the highest number of medications and include the patient ID, patient race, whether they were readmitted or not, and provide their number of medications and lab procedures

In this project, I will walk you through each step I took along the way starting with a brief description of the data and IDE I used, and finishing with providing insights into how the hospital is operating and answering questions proposed by my supervisor.


The Data


Prior to any analysis, I first downloaded the MySQL Workbench and a MySQL server - this would allow me to complete the project using SQL and a large dataset. Once I had the IDE all set up, it was time to add in the data. The dataset I used originated from Kaggle and consisted of two tables - one that included data on patient demographics and the other consisting of data relating to the care/procedures that the patient had. I was also given two scripts that were used to write both the tables into the MySQL workbench.


Analysis


Task 1

The first task I wanted to tackle to visualizing the distribution of time spent in the hospital. Hospitals are limited by their size (number of beds) - so the longer a patient stays in the hospital the longer they are using one of the hospital beds, which leads to less opportunities to treat more patients (and make more money). So it is certainly helpful to understand how the hospital is operating with respect to the distribution of time being spent.

SQL is not the go-to tool for data visualization, however there is a way to create a basic histogram to visualize distribution as shown above. The histogram does show that the majority of patients are staying 1-4 days and a much small portion end up staying over a week - this is good news for the hospital as any given bed in use is not typical being used (tied down) for long periods of time.


Task 2

The next task was to inform the new Hospital Director what medical specialties are typically doing the most number of procedures - they requested to be provided with a list of all specialties and which specialties on average are performing the highest number of procedures. I first created the medical specialty list by using the DISTINCT statement (shown below).

I then took this list and added in the corresponding value for average procedures using the AVG and ROUND functions, additional I added in the COUNT of how many patients each specialty was dealing with. To get the final list (below) that the Director was looking for, I included two filters using the HAVING clause to narrow down to specialties that see at least 50 patients and perform more than 2.5 procedures on average.


Task 3

Next, the Chief of Nursing wanted to know if there may be any bias in how the hospital may be treating patients based on race, gender, or age, specifically when looking at the number of lab procedures done.


To complete this task, I needed to use a JOIN clause to create a new table that included both demographic and health information. I wrote 3 different queries (see below), each one is essentially the same but targeting a different demographic column. As the data suggests, the number of lab procedures is pretty similar between different demographic groups which suggests that the hospital is doing well in preventing bias in patient care, at least when it comes to the number of lab procedures.




Task 4

My supervisor also asked for insight into how the number of procedures a patient receives might correlate to the number of days spent in the hospital. To do so, I had to categorize the number of procedures which I did using a CASE statement, then compared it side by side with the AVG of time spent in the hospital. As you can see below, there is a difference in time spent in the hospital of about a day on average depending on whether the patient requires more than 2 procedures, however there is not a significant time difference when comparing anything over 2 procedures.

Task 5

I also received a quick, "ad-hoc" request to provide a co-worker from the research group with a list of patients that are African-American and had an "Up" for metformin. A UNION operator in SQL was the perfect solution for this problem, and allowed me to quickly send a patient list back to the research group.


Task 6

The Hospital Administrator wanted to highlight some of the hospital's successes. They requested a list of times when patients came to the hospital with an emergency but ended up staying less than the average hospital stay. To write this query as requested, I needed to filter by admission type (emergency) and the average time in the hospital. Since you are unable to filter an aggregate in the WHERE clause, I needed a different solution. This could have been done using a subquery, however I chose to use a Common Table Expression (CTE) instead which I've included below. This gave the Administrator exactly what they were looking for!


Task 7

Finally, the last task consisted of providing a written summary for the 50 patients with the highest number of medications and include the patient ID, patient race, whether they were readmitted or not, and provide their number of medications and lab procedures. To write this query I knew I had to use the CONCAT function to write a string and incorporate values from the table columns. In addition, I also used a CASE function to categorize whether or not they were admitted and a JOIN clause because I needed data from both patient tables. The efficiency and time saved from creating this summary using SQL instead of writing out manually is incredible!


Closing Remarks


I had a great time doing this project! There was a lot of different tasks to complete but I feel like it was a great representation of all the various requests that a data analyst might receive. It also was a perfect way to practice and demonstrate a variety of SQL commands and queries, including Joins, Unions, CTEs, etc.


Please do not hesitate to contact me on LinkedIn and let me know your thoughts on this project. Thank you!


 
 
 

Comments


  • alt.text.label.LinkedIn

Thanks for visiting!

©2022 by Brandon Hopkins | Data Enthusiast.

Proudly created with Wix.com

bottom of page