Teardrops on my GitR: A statistical exploration of musicality in Taylor Swift’s discography

Proposal

library(tidyverse)

Dataset

taylor_all_songs <- read_csv("data/taylor_all_songs.csv")
taylor_album_songs <- read_csv("data/taylor_album_songs.csv")
taylor_albums <- read_csv("data/taylor_albums.csv")

The Taylor R package/dataset was curated by W. Jake Thompson from data collected from the American digital media company Genius and Spotify API.

The R package contains three datasets:

  • taylor_album_songs - 29 variables with 194 observations

  • taylor_all_songs - 29 variables with 274 observations

  • taylor_albums - 5 variables with 14 observations

We chose this dataset because, as avid Taylor Swift listeners, we knew that her extensive song catalog would provide a robust dataset to analyze. We expect that the diversity in the distribution of her songs and the variety between albums will naturally produce an array of both numerical and categorical variables that can be used to build an engaging narrative. We are excited about the opportunity to quantify the relationships between her music’s features and their impacts (critical reception and release/publishing decisions) with detailed, thorough, and captivating visualizations.

Questions

Question 1: Is there a relationship between an album’s musical features and critical reception? Do “speechiness” and “instrumentalness” of songs vary across different albums, and do these values correlate with metacritic score? 

This question will explore how the features of Taylor Swift’s music contribute to the critical reception of her albums. We will visualize the features of the songs on each album available in the dataset by instrumentalness score and speechiness score, and plot them against the album’s metacritic score. Metacritic scores represent a weighted average of a curated group of reviews on metacritic.com. These scores attempt to summarize an album’s critical reception into a single numeric score. The “speechiness” of various Taylor Swift songs is something that has varied over time. Some albums, feel “speechier” than others, and listeners have varying preferences when it comes to speechy or intrumental songs. Our question is really: “Do the critics prefer speechier or more instrumental versions of Taylor Swift?” We will look for connections between these musical features of individual songs to see if there is a consensus in the trends between song speechiness and the album’s overall critical success.

Question 2: How do bonus tracks differ from album tracks? Do the two groups show different trends in duration, tempo, and energy as a whole, and from the other songs on their respective albums?

In answering this question, we will highlight musical differences between songs that were released with Taylor Swift’s original albums and the “bonus tracks” released with her rerecordings. Differences may provide insight into Swift’s reasons for omitting the tracks from the original cut and the evolution of her style. The Taylor Swift data sets we have available have collected observations on many variables. For this question, we considered what might makes bonus tracks different from album tracks, concluding that features such as duration and energy could differentiate a track from the main album, making it more appropriate for a bonus track. We also wanted to explore the breadth of the variables available to us by investigating this combination of duration, tempo, and energy in bonus vs album tracks.

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

Question 1 : Answering question 1 will require merging 2 of the tidy tuesday data sets: Taylor_albums and Taylor_album_songs. This will allow us to associate features of individual songs that lie in the taylor_album_songs dataset and the critic scores which are in the taylor_albums data set so that we can use the variables speechiness, instrumentalness, album_name, and metacritic_score all together. We propose visualizing this question in multiple ways. We can represent each album with a distribution of speechiness scores, ordering the albums from lowest to highest metacritic score. We can do the same with instramentalness scores. If there is a correlation between having a “speechier” album and its critical reception, then it should follow that the opposite relationship will exist between more “instrumental” albums and their critical reception.

As for the type of visualization, we will explore various plots such as scatterplots between average speechiness/instrumentalness and metacritic score.

An additional angle from which we will analyze this question is to plot speechiness vs instramentalness, effectively representing a speechiness index. If we assign a color distribution to the metacritic score of the album each song is on, it should produce a color array or gradient relating the ratio of speechiness : instramentalness to the critical reception of the album.

Question 2: Question 2 will also involve merging data from Taylor_albums and Taylor_album_songs based on album_name. We will separate songs into two groups based on the logical variable bonus_tracks and visualize distributions of songs using three additional variables: duration_ms, tempo, and energy. This will allow us to compare the distribution of these features between all bonus songs and all album songs.

To observe trends within individual albums, we can filter to only show songs from rereleased albums, then facet the visualization by album_name. Faceting will provide insight into the more specific differences between the musical features of bonus tracks and the original songs on their respective albums.