Tackling Tornado Trends

STA/ISS 313 - Project 1


Team Two

Sources for data cleaning:


For this project, our aim is to explore the characteristics of tornadoes within the US and how the nature of tornadoes change based on their environmental conditions, as well as how that negatively impacts the populations in different areas across the country. Our data suggests that there is a relationship between the magnitude and frequency of tornadoes for states across longitudinal lines within the US and that states in longitudinal partitions central to the the southwest and southeast tend to have both higher average tornado magnitudes and higher tornado frequencies. Our data also shows that there’s a seasonal trend of tornado frequency and magnitude. Specifically, there’s an increasing proportion of tornadoes happening in the Spring with the most fatalities due to tornadoes consistently in the Spring as well. We also found that the fatalities of tornado-caused damage is consistent with the change in the average magnitude of tornadoes happening at a given time period, and therefore we are hoping this finding can inform policymakers and population’s awareness of damage prevention and damage control.


The dataset we have chosen to explore is from TidyTuesday and was sourced from the NOAA’s National Weather Service Storm Prediction Center Severe Weather Maps, Graphics, and Data Page, which contains data related to tornadoes, hail, and damaging wind dating from 1950 to 2022. This tornadoes dataset contains information regarding the characteristics of tornadoes that have occurred in the United States and U.S. territories, such as their date and time of occurrence, their magnitude, the number of injuries and fatalities caused, their estimated property loss, their path length, their width, and so on. Specifically, there are 32 different variables and 67870 observations with each observation representing one tornado. The variables are a mix of numerical and categorical variables and a range of class types, including characters, doubles, date, and time, and more information regarding the contents of our dataset can be found in our data dictionary located in data/README.md in this repository. For the purposes of our project, we are only analyzing tornadoes in the contiguous U.S., which includes 48 states (excluding Alaska and Hawaii) and the District of Columbia.

Additionally, in order to create a map as a part of our visualization to answer question 1, we also plan to use the us dataset from the maps package, which essentially contains the geom of each state and therefore allows us to create a map of the United States.

Question 1: How may the nature of tornadoes vary by the location they occur in?


We are first going to investigate how the characteristics of tornadoes may be influenced by where they occur in the contiguous U.S. and if there are any related trends. In this way, we hope to gain an understanding of any geographical trends that underly tornado characteristics, such as their magnitude and distance travelled. Having such an understanding is helpful to policymakers as well as local and national entities, ranging from healthcare systems to NGOs, in identifying which states and areas in the U.S. are the most vulnerable to the impact of tornadoes and where infrastructure and initiatives to protect populations against tornadoes should be implemented.

In order to investigate this question, we will use information from the dataset regarding the states in which tornadoes occur, their magnitude (measured on the Enhanced Fujita scale that ranged from discrete ratings of 0 to 5), and their path length (indicator of distance travelled by tornadoes and their severity). So, we will use the st, mag, and len variables from the original dataset. From this, we can also obtain variables to count the number of tornadoes that have occurred in each state form 1950 to 2022, average the magnitudes of tornadoes by the state they originated in, and classify the states into regions in which tornadoes have occurred. This will allow us to explore various aspects of geographical trends in tornado frequency and severity.


Figure 1

For our first plot, Figure 1, we wanted to visualize how the path length of tornadoes may relate to the magnitude of tornadoes in each of the 5 regions of the U.S. (Midwest, Northeast, Southeast, Southwest, West) to get a higher level understanding of potential geographical patterns in tornado characteristics. We decided to use a ridgeline plot since it is designed to visualize the distribution of continuous numeric variables across different categories, which in our case was the distribution of the path length of tornadoes (len) across the 6 categories on magnitude (mag). We were initially planning on overlaying density curves representing the distribution of the path lengths of tornadoes in each region for each magnitude category, but the overlapping density curves were overlapping too much to the point that it was challenging to differentiate them and identify any regional trends. Therefore, we decided to facet the ridgeline plot by region (region) instead so that we could see how the magnitudes and path lengths and tornadoes may vary by the region of the U.S. they occurred in. We also filled in the area under the density curves using color blind friendly colors by their magnitude to better differentiate the curves across regions by their magnitude category.

Figure 2

For our second plot, Figure 2, we wanted to zone in on individual states in each region to see if regions are actually representative of trends in tornado severity and frequency or if there are categories that better connect states based on the nature of the tornadoes that occur in them. We thereby used geom_sf() to visualize spatial data in the form of a map since we thought using such a plot would allow viewers to quickly and efficiently grasp how characteristics of tornadoes compare across all 48 states of the contiguous U.S. and the District of Columbia.

In order to make the us dataset work well with our existing data frame tornadoes, we created a converter to convert state names to their abbreviation, which is how they are recorded in the tornado dataset. After that, we joined the us dataset with our state-level summary statistics of average magnitude from the state dataset we created based on the tornadoes dataset, and used the resulting dataset to plot the average magnitude of tornadoes that occurred in each states on the map. We used plotly to combine the two plots and add hover labels onto the states on the map to quickly access the state name, region, and average tornado magnitude for the map on the left as well as tornado frequency for the map on the right.


Figure 1

Figure 1: How do tornado magnitude and path length relate and vary by region?

Figure 2