Lesson 1.3: Calculating descriptive statistics#

Recall the maximum screening value of mercury to consume depends on how often fish are being consumed, and weight of person. According to the EPA, the maximum screening value of mercury to be consumed is around 0.46 ug/g/day if not frequently consuming fish.

Tribal fishery managers are interested in whether the mercury levels in local fish belly fat exceeds this amount. They are also interested in if the amount of mercury in fish is connected to fish size.

Below, we will analyze data points from the “fish” data set to try and answer these questions.

From Lesson 1, we learned that we can reference columns in a data frame using $. Next we will use functions applied to the mercury column, starting with the mean().

# if running in google colab, uncomment nad run the next line, otherwise ignore this
# fish_1998 <- read.csv("https://raw.githubusercontent.com/rachtorr/IndigenousEnvDataSci.github.io/refs/heads/main/MOD1/fish_1998.csv")

# reload data 
fish_1998 <- read.csv("fish_1998.csv", header = TRUE, sep = ',')

# get the mean of mercury column 
mean(fish_1998$mercury)
0.448117680713238

🧠✍️ Class Questions

  • What is the mean (average) amound of mercury found in local fish in 1998?

  • Why would knowing the mean amount of mercury in fish be useful for fishery managers and the community?

Descriptive statistics: mean v. median#

The “median” is the middle level of mercury data (half the fish in the dataset have more mercury and half have less).

Means and medians are often similar, However:

  • They will be different if the data have a lot of large values or a lot of small valued (skewed data).

  • Sometimes you are asked to use one or the other, so it’s good to be careful about which you are calculating.

median(fish_1998$mercury) 
0.413539593842157

🧠✍️ Class Question:

  • How would we interpret these median and mean values? Should the K’avi community be concerned?

Your turn

How can we find the minimum and maximum amount of mercury in the fish? Work in groups to see if you can find these values using the commandsmax() and min()

# find the min and max mercury amount 

🧠✍️ Class Question:

  • What are the minimum and maximum levels of mercury?

  • How would we interpret these maximum and minimum values? Are they as important as the average?

Summarizing data#

The mean, median, minimum and maximum are all “descriptive statistics.” This means that they provide key information about the distribution of data. Here, we want to know the distribution of mercury in fish collected in 1998.

The function summary() estimates all these descriptive statsitics and more. 1st Qu. and 3rd Qu. refer to the 1st and 3rd “Quartiles”, which represent a quarter of the data. They tell you that 25% of the data is between that value, the Median, and either the Min. (1st Qu.) or Max. (3rd Qu.).

summary(fish_1998$mercury)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.007764 0.278103 0.413540 0.448118 0.557474 1.894938 

🧠✍️ Class Question:

  • Looking at all of these values, are there unsafe levels of mercury in local fish? Are there dangerous levels of mercury in most fish, or only some?

Recap Lesson 1.3#

In Lesson 1.3 we learned how to:

  • get descriptive statistics for a vector

  • interpret the summary statistics for our data set

How could we communicate these values?#

However, summary stats can’t answer every question, and as a list of numbers they would be very difficult to communicate to the K’avi community. It would be much more helpful to visualize the data in a graph, which we will do in the next part.