Simulating different configurations of field plots for modeling aboveground biomass from airborne laser scanning (ALS)¶
Journals: Forest Ecology and Management? Biogeosciences? Methods in Ecology and Evolution?
Aim: This analysis aims to inform GEO-TREES to evaluate if proposed sites have sufficient plot data and to guide the placement of new plots by investigating the following research questions:
(1) What is the minimum quantity of field plot data needed to represent landscape-level variation in ALS-derived forest structure?
(2) How does this minimum differ for random sampling compared to optimized sampling?
(3) What is the minimum quantity of field plot data needed to fit robust predictive models of aboveground biomass density within a subset of the landscape?
Methods: We analyze high-quality ALS datasets (> 1000 ha) and very large field-plot measurement datasets (> 80 ha of stem measurements) at four data “hotspots” covering a range of tropical vegetation types:
- Barro Colorado Island, Panama
- Mpala, Kenya
- Huai Kha Khaeng, Thailand
- Paracou, French Guiana
- Danum Valley, Malaysia
For question 1, we simulate two random sampling scenarios, representing clustered and distributed configurations of plot data:
- A single, large, rectangular plot (common in ForestGEO)
- Distributed, non-contiguous, 1-ha plots (common in Forestplots.net)
For each of these scenarios, we vary the amount of plot data in simulations from 3 – 80 ha of plot data.
Then we calculate the Mahalonobis distance of each pixel to the plots in relative structural space, using four structural metrics:
- Mean height of canopy returns,
- Canopy cover above 2 m,
- Variance in height of canopy returns,
- The coefficient of variation of the leaf-area density profile.
Mahalanobis distances are multideminsional Z-scores, so each pixel's distance can be expressed as the number of standard deviations of the sample distribution. To assess how well each plot simulation represents structure across the landscape, we quantify the relationship between total plot area and the proportion of pixels greater than 2 or 3 standard deviations. We plot the relationship between this proportion and the area of the simulation to identify the minimum viable plot size for each landscape.
For question 2, we compare the random sampling scenarios from question 1 with two realistic optimized sampling scenarios, with the plots optimally placed based to maximize structural:
- Iterative, optimized placement of 1-ha plots (realistic for establishing a new site),
- A single, randomly placed rectangular plot (10 or 25 ha) with optimized placement of additional 1-ha plots (realistic for adding supplemental plots to established ForestGEO sites).
For question 3, we use random spatial subsets of existing plot data with known AGBD estimates to fit biomass models.
As in question 1, we will repeat this analysis under two scenarios, firstly using a single large rectangular plot, and secondly using distributed 1-ha plots.
We vary the area of total simulated plot data from 3 ha to the maximum area that can fit within the plot data while still allowing for test data. For each random simulation of plots (training fold), we apply a buffer of 50 m (one subplot) and use the remaining unsampled subplots as the test data. We fit predictive models of AGBD with each training fold, and assess each model with the testing fold. We examine the relationship between the total area of 1-ha or single plots and the the goodness-of-fit of the biomass prediction model fitted to the plot data, such as the bias (mean residual error), RMSE, MAE.
Results:
We use changepoint analysis to to identify sills in the relationships between plot area and structural representativeness and model performance.
Recommendations for minimum plot sizes under difference scenarios.
Proof-of-concept of optimized plot placement for new or supplemental plots for calibration and validation sites for AGBD.
1 2 3 4 5 6 7 8 9 10 | # site-specific information, defined by the user: # specify metadata names #------------------------------------------------------- site = "PanamaCanal" type = "ALS" platform = "Aircraft" date = "20230526" # set the coordinate reference system for the site CRS_site = 32617 |
conda install r-ggh4x r-hmiscconda install ./environments/r-lasr-0.17.0-r43hecca717_0.conda
1 2 3 4 5 6 7 8 9 10 11 12 | # # # # #these packages couldn't be found by conda, so install them with R temporarily # # install.packages("remotes") # # remotes::install_github("njtierney/broomstick") # # install.packages("ggh4x") # # remotes::install_github("tidymodels/corrr") # # remotes::install_github("jrnold/resamplr") # # # remotes::install_github("ptompalski/lidRmetrics") # # # remotes::install_github("r-lidar/lasR", ref = "pdt") # # remotes::install_github("DavisVaughan/furrr") # # # install.packages("anticlust") # install.packages("numbers") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # load in the required R packages suppressMessages({ # library(resamplr) library(broom) # library(broomstick) library(ggpubr) library(sf) library(future) library(terra) library(lidR) library(RCSF) library(RStoolbox) library(BIOMASS) library(httr2) library(randomForest) library(tidyterra) library(ggh4x) library(exactextractr) # library(furrr) library(tidyverse) # library(lidRmetrics) # library(spatialsample) # library(anticlust) library(modelr) library(spatstat) library(lasR) }) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | options(repr.matrix.max.cols = 400, repr.matrix.max.rows = 100) # define some variables for mapping in ggplot theme_map = theme(panel.border = element_rect(color = "black", fill = NA), axis.title = element_blank(), axis.text.x = element_text(angle = 90), axis.ticks = element_blank(), legend.position = "inside", legend.position.inside = c(.98, .02), legend.justification = c(1, 0), strip.background = element_rect(fill = "white"), legend.key = element_rect(fill = "white", color = "white"), legend.title = element_text(hjust = 0.5), legend.direction = "horizontal", legend.key.width = unit(.5, "in"), legend.key.height = unit(.1, "in"), legend.background = element_rect(color = "black", fill = "white", linewidth = .3), legend.margin = ggplot2::margin(7, 12, 7, 12)) cols_gradient <- c(low = "#002070", mid = "#DD6677", high = "#F1F1A1") cols_gradient_greyscale <- c(low = "#111111", mid = "#666666", high = "#FFFFFF") cols_gradient_biomass <- c(low = "#002070", mid = "#309040", high = "#D0E358") cols_gradient_metrics <- c(low = "#00333A", mid = "#637F8A", high = "beige") |
Table 1. Potential study sites
| Site | ALS acquisition date | Plot | Size (ha) | Census date | DBH cutoff (cm) | Forest type |
|---|---|---|---|---|---|---|
| Barro Colorado Nature Monument | May 2023 | ForestGEO plot | 50 | 2022 | 1 | Old forest |
| Gigante fertilization plot | 38.4 | 2023 | 20 (10, 1 in places) | Old forest, partially fertilized | ||
| P12 | 1 | Nov 2023 | 1 | Old forest | ||
| P14 | 1 | Oct/Nov 2023 | 1 cm | Old forest | ||
| Secondary forest plots | ? | 2024? | ? | Young forest | ||
| Secondary forest plots | ? | 2024? | ? | Young forest | ||
| ___________________________________ | ||||||
| Mpala | Feb 2025 | ForestGEO plot | 100 | 2025 | 1 | Structurally distinct savannas on black cotton soil, transition zone, and red soil |
| ___________________________________ | ||||||
| Paracou | ? | 25-ha plot | 25 | Every 5 years | 10 | Old forest, mix of upland and lowland |
| Biodiversity plots | 18.75 | Every 2 years? | 10 | Old forest | ||
| Disturbance experiment | 75 (12 x 6.25) | Every 2 years | 10 | Selective logging, thinning, and control plots. |
___________________________________||||||| | Huai Kha Khaeng | Oct 2025? | ForestGEO plot | 50 | ? | 1 | Seasonal evergreen and deciduous forest | | | | Kapook Kapieng | 16 | ? | 1 | Seasonal deciduous forest | | | | Huai Krading | 16 | ? | 1 cm |Seasonal evergreen and deciduous forest |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | resample_1ha_plots = function (data, size_ha, drop_buffer) { # if (!is.numeric(p) || length(p) < 2 ) { # stop("`p` must be a named numeric vector with at least two values.") # } # if (abs(sum(p) - 1) > 1e-06) { # message("Rescaling `p` to sum to 1.") # } data = data |> mutate(p_x = as.numeric(str_split_i(subplot_id, "_", -2)), p_y = as.numeric(str_split_i(subplot_id, "_", -1))) |> group_by(plot_id) |> mutate(plot_size_ha = n_distinct(subplot_id) / 4, # the max dimensions of each plot, needed later to prevent failure # add one because we label from 0 plot_max_y = max(p_y + 1), plot_max_x = max(p_x + 1)) n <- nrow(data) idx = seq_len(n) n_subplots = size_ha * 4 dim_x = 2 dim_y = 2 # empty vectors to start the loop train_merge = c() buffer_merge = c() exclude_merge = c() n_restart = 1 while (length(train_merge) < n_subplots) { subplots_p = which(data$p_x >= (dim_x - 1) & data$p_y >= (dim_y - 1) & data$plot_size_ha >= 1) |> setdiff(exclude_merge) # if subplots_p is empty, it means that there is no solution to achieve the desired number of plots # in this case, start over # if size_ha is too high, it may be impossible; allow 10000 restarts if (is.empty(subplots_p)) { if (n_restart > 100000) { stop("100000 failed attempts; size_ha is probably too large for the plot data provided") } n_restart = n_restart + 1 train_merge = c() buffer_merge = c() next } # random seed points # sample does not produce a random sample from a single element if (length(subplots_p) == 1) { seed = subplots_p } else { seed = sample(subplots_p, 1) } seed_x = data$p_x[seed] seed_y = data$p_y[seed] seed_plot_id = data$plot_id[seed] # the index of the selected subplot train = which((data$p_x > (seed_x - dim_x) & data$p_x <= seed_x) & (data$p_y > (seed_y - dim_y) & data$p_y <= seed_y) & data$plot_id == seed_plot_id) if (length(train) < 4) { next } buffer = which((data$p_x > (seed_x - dim_x - 1) & data$p_x <= (seed_x + 1)) & (data$p_y > (seed_y - dim_y - 1) & data$p_y <= (seed_y + 1)) & data$plot_id == seed_plot_id) exclude = which((data$p_x > (seed_x - dim_x - 1) & data$p_x <= (seed_x + 2)) & (data$p_y > (seed_y - dim_y - 1) & data$p_y <= (seed_y + 2)) & data$plot_id == seed_plot_id) # merge with past cycles train_merge = c(train_merge, train) buffer_merge = c(buffer_merge, buffer) exclude_merge = c(exclude_merge, exclude) } if (isFALSE(drop_buffer)) { test = resample(data, setdiff(idx, train_merge)) } else { test = resample(data, setdiff(idx, buffer_merge)) } train = resample(data, train_merge) list(test = test, train = train) } crossv_1ha_plots = function (data, n, size_ha, id = ".id", drop_buffer = TRUE) { if (!is.numeric(n) || length(n) != 1) { stop("`n` must be a single integer.", call. = FALSE) } if (!is.numeric(size_ha) || size_ha < 1) { stop("`size_ha` must be a value between 0 and 1.", call. = FALSE) } runs <- purrr::map(seq_len(n), ~resample_1ha_plots(data = data, size_ha = size_ha, drop_buffer = drop_buffer)) cols <- purrr::transpose(runs) cols[[id]] <- vctrs::vec_group_id(seq_len(n)) tibble::as_tibble(cols) |> mutate(size_ha = size_ha, scenario = "1-ha plots") } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | # set.seed(1) resample_plot = function (data, size_ha, drop_buffer) { # if (!is.numeric(p) || length(p) < 2 ) { # stop("`p` must be a named numeric vector with at least two values.") # } # if (abs(sum(p) - 1) > 1e-06) { # message("Rescaling `p` to sum to 1.") # } data = data |> mutate(p_x = as.numeric(str_split_i(subplot_id, "_", -2)), p_y = as.numeric(str_split_i(subplot_id, "_", -1))) |> group_by(plot_id) |> mutate(plot_size_ha = n_distinct(subplot_id) / 4, # the max dimensions of each plot, needed later to prevent failure # add one because we label from 0 plot_max_y = max(p_y + 1), plot_max_x = max(p_x + 1)) n <- nrow(data) idx = seq_len(n) n_subplots = size_ha * 4 # get the integers by which the number of subplots is divisible divisors = numbers::divisors(n_subplots) # plots must be at least 100 m wide, so drop division by 1 divisors = head(tail(divisors, -1), -1) # avoid super long, skinny plots; drop division by 2 if there are other options if (length(divisors) > 2) { divisors = head(tail(divisors, -1), -1)} # plots also can't be longer than either of the dimensions of the plot divisors = divisors[divisors <= max(data$plot_max_x) & (n_subplots / divisors) <= max(data$plot_max_y)] # empty vector to start the while loop train = c() while (length(train) < n_subplots) { if (is.empty(divisors)) { stop("impossible to fit such a large plot in the given dataset") } # x dimension of the plot # sample does not produce a random sample from a single element # for a plot size of 1 ha, there is only a single divisor, 2 if (length(divisors) == 1) { dim_x = divisors } else { dim_x = sample(divisors, 1) } # y dimension of the plot dim_y = n_subplots / dim_x subplots_p = which(data$p_x >= (dim_x - 1) & data$plot_max_x >= dim_x & data$p_y >= (dim_y - 1) & data$plot_max_y >= dim_y & data$plot_size_ha >= size_ha) # random seed points # sample does not produce a random sample from a single element if (length(subplots_p) == 1) { seed = subplots_p } else { seed = sample(subplots_p, 1) } seed_x = data$p_x[seed] seed_y = data$p_y[seed] seed_plot_id = data$plot_id[seed] train = which((data$p_x > (seed_x - dim_x) & data$p_x <= seed_x) & (data$p_y > (seed_y - dim_y) & data$p_y <= seed_y) & data$plot_id == seed_plot_id) } if (length(train) < n_subplots) { next } exclude = which((data$p_x > (seed_x - dim_x - 1) & data$p_x <= (seed_x + 1)) & (data$p_y > (seed_y - dim_y - 1) & data$p_y <= (seed_y + 1)) & data$plot_id == seed_plot_id) if (isFALSE(drop_buffer)) { test = resample(data, setdiff(idx, train)) } else { test = resample(data, setdiff(idx, exclude)) } train = resample(data, train) list(test = test, train = train) } crossv_plot = function (data, n, size_ha, id = ".id", drop_buffer = TRUE) { if (!is.numeric(n) || length(n) != 1) { stop("`n` must be a single integer.", call. = FALSE) } if (!is.numeric(size_ha) || size_ha < 1) { stop("`size_ha` must be a value between 0 and 1.", call. = FALSE) } runs <- purrr::map(seq_len(n), ~resample_plot(data = data, size_ha = size_ha, drop_buffer = drop_buffer)) cols <- purrr::transpose(runs) cols[[id]] <- vctrs::vec_group_id(seq_len(n)) tibble::as_tibble(cols) |> mutate(size_ha = size_ha, scenario = "Single plot") } # test = subs |> # # filter(plot_id == "Panama Canal:BCI 50 ha plot") |> # mutate(p_x = as.numeric(str_split_i(subplot_id, "_", -2)), # p_y = as.numeric(str_split_i(subplot_id, "_", -1))) |> # crossv_test() # ggplot() + # geom_sf(data = subs, fill = "grey95" # ) + # geom_sf(data = as_tibble(test$train[[1]]), fill = "red4", alpha = .7) + # geom_sf(data = as_tibble(test$test[[1]]), fill = "steelblue", alpha = .5) + # # geom_sf(data = as_tibble(test$train[[3]]), fill = "forestgreen", alpha = .5) + # # geom_sf(data = as_tibble(test$train[[4]]), fill = "orchid4", alpha = .5) + # theme_bw() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | # resample_random = function (data, size_ha) # { # # if (!is.numeric(p) || length(p) < 2 ) { # # stop("`p` must be a named numeric vector with at least two values.") # # } # # if (abs(sum(p) - 1) > 1e-06) { # # message("Rescaling `p` to sum to 1.") # # } # data = data |> # mutate(p_x = as.numeric(str_split_i(subplot_id, "_", -2)), # p_y = as.numeric(str_split_i(subplot_id, "_", -1))) |> # group_by(plot_id) |> # mutate(plot_size_ha = n_distinct(subplot_id) / 4, # # the max dimensions of each plot, needed later to prevent failure # # add one because we label from 0 # plot_max_y = max(p_y + 1), # plot_max_x = max(p_x + 1)) # n <- nrow(data) # idx = seq_len(n) # n_subplots = size_ha * 4 # # empty vectors to start the loop # train_merge = c() # buffer_merge = c() # while (length(train_merge) < n_subplots) { # subplots_p = idx |> # setdiff(train_merge) # # random seed points # # sample does not produce a random sample from a single element # if (length(subplots_p) == 1) { # seed = subplots_p # } else { # seed = sample(subplots_p, 1) # } # seed_x = data$p_x[seed] # seed_y = data$p_y[seed] # seed_plot_id = data$plot_id[seed] # # the index of the selected subplot # # requires adjacent plots to be merged! # train = which(data$p_x == seed_x & # data$p_y == seed_y & # data$plot_id == seed_plot_id) # buffer = which(data$p_x %in% c(seed_x, seed_x - 1, seed_x + 1) & # data$p_y %in% c(seed_y, seed_y - 1, seed_y + 1) & # data$plot_id == seed_plot_id) # # merge with past cycles # train_merge = c(train_merge, train) # buffer_merge = c(buffer_merge, buffer) # } # # train = seed + seq_len(n_subplots) # list(test = resample(data, setdiff(idx, buffer_merge)), # train = resample(data, train_merge)) # } # crossv_random = function (data, n, size_ha, id = ".id") # { # if (!is.numeric(n) || length(n) != 1) { # stop("`n` must be a single integer.", call. = FALSE) # } # if (!is.numeric(size_ha) || size_ha < 1) { # stop("`size_ha` must be a value between 0 and 1.", call. = FALSE) # } # runs <- purrr::map(seq_len(n), ~resample_random(data = data, size_ha = size_ha)) # cols <- purrr::transpose(runs) # cols[[id]] <- vctrs::vec_group_id(seq_len(n)) # tibble::as_tibble(cols) |> # mutate(size_ha = size_ha, # scenario = "Random") # } |
Prepare the data for Question 1:
metrics_landscape0.25-ha ALS metrics raster for all usable pixels of the acquisition, from STAGE 1.domain_landscapeALS landscape metrics, as a dataframe, with predictors scaled and cell coordinates as integers.runs_q1a nested dataframe with randomly generated plots as train-test splits, as list-columns.mdist_q1a nested dataframe with Mahalanobis distances and thresholds for each split.df_plot_q1one random plot selected as an example to be plotted in the figures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | metrics_landscape = rast("../ALS_output/PanamaCanal/ALS/Aircraft_20230526/BCI_Gigante/output/_06_metrics_raster/predictors_50m.tif") # ALS_coords = as_coordinates(ALS_metrics, as.raster = TRUE) domain_landscape = metrics_landscape |> as_tibble(xy = TRUE) |> mutate(plot_id = "BCI") |> mutate(p_x = x / 50, p_y = y / 50) |> unite("subplot_id", p_x:p_y, remove = FALSE) |> drop_na(zmean.canopy.vox) |> filter(zq99.canopy.vox < 100 & lad_cv.aboveground.vox < 3) |> mutate(pzabove2.aboveground.vox = if_else(pzabove2.aboveground.vox < 100, pzabove2.aboveground.vox, (n.canopy.vox / (n.canopy.vox + 1)) * 100), Pf.aboveground.vox = 100 - pzabove2.aboveground.vox) %>% mutate(zmean.scale = scale(zmean.canopy.vox), zq99.scale = scale(zq99.canopy.vox), pzabove2.scale = scale(pzabove2.aboveground.vox), zvar.scale = scale(zvar.canopy.vox), lad_cv.scale = scale(lad_cv.aboveground.vox)) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # ALS data set.seed(2) run_list = list() x_ha = seq(3, 80, 1) # x_test = x_test[!(x_test %in% c(23, 29, 31, 37, 41))] n_repeat = 12 for (i in seq_along(x_ha)) { x = x_ha[i] subs_plot = domain_landscape |> crossv_plot(n = n_repeat, size_ha = x, drop_buffer = FALSE) subs_1ha_plots = domain_landscape |> crossv_1ha_plots(n = n_repeat, size_ha = x, drop_buffer = FALSE) df = bind_rows(subs_plot, subs_1ha_plots) run_list[[i]] = df } runs_q1 = do.call("rbind", run_list) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | # fit models for the entire dataset mdist_q1 = runs_q1 %>% mutate(train = map(train, ~as_tibble(.x))) %>% mutate(train = map(train, ungroup)) %>% mutate(test = map(test, ~as_tibble(.x))) %>% mutate(test = map(test, ungroup)) %>% # Mahalanobis distance mutate(train = map(train, ~mutate(.x, m_d = mahalanobis(x = select(., zmean.scale, # zq99.scale, pzabove2.scale, zvar.scale, lad_cv.scale), # Select the variables for distance calculation center = colMeans(select(.x, zmean.scale, # zq99.scale, pzabove2.scale, zvar.scale, lad_cv.scale)), # Mean vector of selected variables cov = cov(select(.x, zmean.scale, # zq99.scale, pzabove2.scale, zvar.scale, lad_cv.scale)) # Covariance matrix of selected variables ), m_d = sqrt(m_d), threshold = case_when(m_d < 2 ~ "< 2 SD", m_d < 3 ~ "< 3 SD", m_d >= 3 ~ "≥ 3 SD") ))) %>% mutate(test = map2(test, train, ~mutate(.x, m_d = mahalanobis(x = select(., zmean.scale, # zq99.scale, pzabove2.scale, zvar.scale, lad_cv.scale), # Select the variables for distance calculation center = colMeans(select(.y, zmean.scale, # zq99.scale, pzabove2.scale, zvar.scale, lad_cv.scale)), # Mean vector of selected variables cov = cov(select(.y, zmean.scale, # zq99.scale, pzabove2.scale, zvar.scale, lad_cv.scale)) # Covariance matrix of selected variables ), m_d = sqrt(m_d), threshold = case_when(m_d < 2 ~ "< 2 SD", m_d < 3 ~ "< 3 SD", m_d >= 3 ~ "≥ 3 SD") ))) |
1 2 3 4 5 6 7 | df_plot_q1 = mdist_q1 |> filter(size_ha == 16 & .id == 11) |> mutate(train = map(train, ~mutate(as_tibble(.x), subset = "train"))) |> mutate(test = map(test, ~mutate(as_tibble(.x), subset = "test"))) |> mutate(data = map2(train, test, ~bind_rows(.x, .y))) |> unnest(data) |> dplyr::select(-.id, -test, -train) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # data frame for plotting trends in model accuracy df_eval_q1 = mdist_q1 %>% dplyr::select(-train) %>% unnest(test) %>% group_by(.id, size_ha, scenario) |> # add the total number of pixels, to calculate percentages later mutate(n_total = n(), `Mean Mahalanobis distance` = mean(m_d)) |> group_by(.id, size_ha, scenario, threshold, n_total, `Mean Mahalanobis distance`) |> tally() |> mutate(n = n / n_total) |> pivot_wider(names_from = "threshold", values_from = "n") |> # change to percent pixels greater than 2 SD mutate(`Pixels ≥ 2 SD (%)` = 1 - `< 2 SD`, `Pixels ≥ 3 SD (%)` = `≥ 3 SD`) |> pivot_longer(c(`Mean Mahalanobis distance`, `Pixels ≥ 2 SD (%)`, `Pixels ≥ 3 SD (%)`), values_to = "value", names_to = "metric") |> dplyr::select(.id:n_total, metric, value) |
Prepare the data for question 3:
subplots_summ0.25-ha summary metrics from field measurements, from John's script.metrics_subplotscombined field metrics and ALS metrics for all subplots, with predictors scaled.runs_q3a nested dataframe with randomly generated plots as train-test splits, as list-columns.models_q3a nested dataframe with predictions and residuals from AGBD models for each split.df_plot_q3one random plot selected as an example to be plotted in the figures.df_eval_q3a dataframe with each split summarized as the mean Mahalanobis distance and percent of pixels past the threshold.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # read in the ALS metrics for subplots # path for the subplot-level field measurement outputs subplots_summ = read.csv("/projects/my-private-bucket/GEO-TREES/data/PanamaCanal/plots/processed/sub_summ.csv") # paths for the subplot-level ALS metrics dir_subplots_list = Sys.glob(paste0("../ALS_output/", site, "/", type, "/", platform, "_", date, "/*/output/_08_metrics_subplots_dataframe/subplots_metrics.rds")) # subplot metrics as a data frame for (i in seq_along(dir_subplots_list)) { subplots_i = readRDS(dir_subplots_list[i]) |> mutate(subplot_id = as.character(subplot_id)) if (i == 1) { subplots_df = subplots_i } else { subplots_df = bind_rows(subplots_df, subplots_i) } } # join the field measurements to the ALS metrics metrics_subplots = subplots_df %>% # create a new variable, gap fraction, Pf mutate(pzabove2.aboveground.vox = if_else(pzabove2.aboveground.vox < 100, pzabove2.aboveground.vox, (n.canopy.vox / (n.canopy.vox + 1)) * 100), Pf.aboveground.vox = 100 - pzabove2.aboveground.vox) %>% left_join(subplots_summ, by = c("plot_id", "subplot_id")) |> st_sf() |> drop_na(agb_sum_ha) |> # select only plots within BCI_Gigante, for now filter(plot_id %in% c("Panama Canal:BCI 50 ha plot", "Panama Canal:Gigante fertilization plot", "Panama Canal:P12", "Panama Canal:P14")) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | set.seed(5) run_list = list() x_ha = 3:25 x_ha = x_ha[!(x_ha %in% c(23, 29, 31, 37, 41))] n_repeat = 50 for (i in seq_along(x_ha)) { x = x_ha[i] subs_plot = metrics_subplots |> crossv_plot(n = n_repeat, size_ha = x) subs_1ha_plots = metrics_subplots |> crossv_1ha_plots(n = n_repeat, size_ha = x) # subs_random = subs |> crossv_random(n = n_repeat, size_ha = x) df = bind_rows(subs_plot, subs_1ha_plots) run_list[[i]] = df } runs_q3 = do.call("rbind", run_list) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # fit models for the entire dataset models_q3 = runs_q3 %>% # multiple models to compare mutate(model_zmean = purrr::map(train, ~lm(log(agb_sum_ha) ~ log(zmean.canopy.vox), data = .)), model_Coomes = purrr::map(train, ~lm(log(agb_sum_ha) ~ log(zmean.canopy.vox) + log(Pf.aboveground.vox), data = .)), model_Bouvier = purrr::map(train, ~lm(log(agb_sum_ha) ~ log(zmean.canopy.vox) + log(Pf.aboveground.vox) + log(zvar.canopy.vox) + log(lad_cv.aboveground.vox), data = .))) %>% pivot_longer(model_zmean:model_Bouvier, names_to = "model_name", values_to = "model") %>% # extract the parameters, for the full models mutate(tidy = map(model, broom::tidy), B0 = tidy %>% map_dbl(function(x) x$estimate[1]), B1 = tidy %>% map_dbl(function(x) x$estimate[2]), B2 = tidy %>% map_dbl(function(x) x$estimate[3]), B3 = tidy %>% map_dbl(function(x) x$estimate[4]), B4 = tidy %>% map_dbl(function(x) x$estimate[5]), p = tidy %>% map_dbl(function(x) nrow(x)) ) %>% # convert from a resample object to a data frame mutate(train = map(train, ~as_tibble(.x))) %>% # add predictions for each model mutate(train = map2(train, model, ~modelr::add_predictions(.x, .y))) %>% # add residuals for each model mutate(train = map2(train, model, ~modelr::add_residuals(.x, .y))) %>% # bias correction for the log-log relationship of this model # extract p, the number of parameters mutate(train = map(train, ungroup)) %>% mutate(train = map2(train, p, ~mutate(.x, resid2 = resid^2, p = .y, SEE = sqrt((sum(resid2)) / (length(resid2) - p)), pred2 = pred * (exp(SEE^2 / 2)), residual = agb_sum_ha - pred2))) %>% # extract the SEE for each model, to be applied to the test data mutate(SEE = map(train, ~first(.$SEE))) %>% # apply each model to the testing data and produce predictions and residuals mutate(test = map2(test, model, ~modelr::add_predictions(as_tibble(.x), .y))) %>% mutate(test = map2(test, SEE, ~mutate(.x, SEE = .y, pred2 = exp(pred) * (exp(SEE^2 / 2)), residual = agb_sum_ha - pred2))) |
1 2 3 4 5 6 7 8 | df_plot_q3 = runs_q3 |> filter(size_ha == 16 & .id == 8) |> mutate(train = map(train, ~mutate(as_tibble(.x), subset = "train"))) |> mutate(test = map(test, ~mutate(as_tibble(.x), subset = "test"))) |> mutate(data = map2(train, test, ~bind_rows(.x, .y))) |> unnest(data) |> dplyr::select(-.id, -test, -train) |> st_as_sf() |
1 2 3 4 5 6 7 8 9 10 | # data frame for plotting trends in model accuracy df_eval_q3 = models_q3 %>% dplyr::select(-train, -model, -tidy, -SEE) %>% unnest(test) %>% group_by(.id, size_ha, scenario, model_name) |> summarize(Bias = mean(residual, na.rm = TRUE), MAE = mean(abs(residual), na.rm = TRUE), RMSE = sqrt(mean(residual^2, na.rm = TRUE))) |> pivot_longer(Bias:RMSE, values_to = "value", names_to = "metric") |
`summarise()` has grouped output by '.id', 'size_ha', 'scenario'. You can override using the `.groups` argument.
Figures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | options(repr.plot.width = 12, repr.plot.height = 12) fig_landscape = ggplot() + geom_tile(data = domain_landscape, aes(x = x, y = y), color = "grey70", fill = NA) + geom_tile(data = df_plot_q1, aes(x = x, y = y, fill = subset), alpha = .8) + scale_fill_manual(values = c("steelblue3", "indianred")) + theme_bw(base_size = 15) + scale_x_continuous(expand = expansion(mult = c(.01, .01))) + scale_y_continuous(expand = expansion(mult = c(.01, .01))) + theme_map + coord_equal() + theme(legend.position.inside = c(.02, .98), plot.background = element_blank(), legend.justification = c(0, 1), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(fill = "white", color = "white"), legend.title = element_blank(), axis.text = element_blank(), axis.text.x = element_blank(), legend.direction = "vertical", legend.key.width = unit(.1, "in"), legend.key.height = unit(.1, "in"), legend.key.spacing.y = unit(.05, "in")) + facet_grid(. ~ scenario) fig_plot = ggplot() + geom_tile(data = domain_landscape, aes(x = x, y = y), fill = "grey75", color = NA) + geom_sf(data = metrics_subplots, fill = "grey75") + geom_sf(data = df_plot_q3, aes(fill = subset)) + scale_fill_manual(values = c("steelblue3", "indianred")) + theme_bw(base_size = 15) + scale_x_continuous(expand = expansion(mult = c(.01, .01))) + scale_y_continuous(expand = expansion(mult = c(.01, .01))) + theme_map + # coord_equal() + theme(legend.position.inside = c(.02, .98), plot.background = element_blank(), legend.justification = c(0, 1), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(fill = "white", color = "white"), legend.title = element_blank(), axis.text = element_blank(), axis.text.x = element_blank(), legend.direction = "vertical", legend.key.width = unit(.1, "in"), legend.key.height = unit(.1, "in"), legend.key.spacing.y = unit(.05, "in")) + facet_grid(. ~ scenario) fig = ggarrange(fig_landscape, fig_plot, ncol = 1) ggsave(plot = fig, filename = "./figures/Figure_2.jpeg", device = jpeg, width = 12, height = 24, units = "in", dpi = 150, bg = "white", create.dir = TRUE) IRdisplay::display_jpeg(file = "./figures/Figure_2.jpeg") |
Figure 2. Conceptual framework for this paper, an example of random 1-ha plots and single random large plot to sample (a) variation across the landscape (Questions 1 and 2) and (b) real field plot data (Question 3).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | pred_names = c("zmean.canopy.vox", "pzabove2.aboveground.vox", "lad_cv.aboveground.vox", "zvar.canopy.vox") metrics_list = list() cols_gradient_greyscale <- c(low = "#122222", mid = "#666666", high = "snow2") title_list = c("Mean of canopy heights (m)", "Canopy cover (%)", "CV of the LAD profile", "Variance of canopy heights (m)") df_plot_i = df_plot_q1 |> arrange(subset) |> filter(scenario == "Single plot") for (i in 1:length(pred_names)) { metrics_list[[i]] = ggplot() + geom_tile(data = df_plot_q1, aes(x = x, y = y, fill = .data[[pred_names[i]]])) + geom_tile(data = filter(df_plot_i, subset == "train"), aes(x = x, y = y), fill = "indianred", alpha = .8) + coord_equal() + scale_fill_gradientn(colors = cols_gradient_greyscale, na.value = "grey70", name = title_list[[i]]) + scale_x_continuous(expand = expansion(mult = c(.02, .02))) + scale_y_continuous(expand = expansion(mult = c(.15, .01))) + theme_bw(base_size = 12) + theme_map + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5), axis.text.x = element_blank(), axis.text.y = element_blank()) + guides(fill = guide_colourbar(title.position="top")) } fig_cor1 = df_plot_i |> ggplot(aes_string(x = pred_names[1], y = pred_names[2])) + geom_point(alpha = .2, color = cols_gradient_greyscale[2], size = 3, shape = 19) + geom_point(data = filter(df_plot_i, subset == "train"), alpha = .9, aes(color = subset), size = 3, shape = 19) + stat_boxplot(aes(x = .data[[pred_names[1]]], fill = subset, y = (max(.data[[pred_names[2]]]) - min(.data[[pred_names[2]]])) *.1 + max(.data[[pred_names[2]]])), width = (max(df_plot_q1[[pred_names[2]]]) - min(df_plot_q1[[pred_names[2]]])) * .1, position = position_dodge(width = (max(df_plot_i[[pred_names[2]]]) - min(df_plot_i[[pred_names[2]]])) * .12), outlier.shape = NA) + stat_boxplot(aes(y = .data[[pred_names[2]]], fill = subset, x = (max(.data[[pred_names[1]]]) - min(.data[[pred_names[1]]])) *.1 + max(.data[[pred_names[1]]])), width = (max(df_plot_q1[[pred_names[1]]]) - min(df_plot_q1[[pred_names[1]]])) * .1, position = position_dodge(width = (max(df_plot_i[[pred_names[1]]]) - min(df_plot_i[[pred_names[1]]])) * .12), outlier.shape = NA) + coord_cartesian(clip = "off") + labs(x = title_list[[1]], y = title_list[[2]]) + scale_x_continuous(expand = expansion(mult = c(.02, .02)), breaks = seq(0, 100, 10)) + scale_y_continuous(expand = expansion(mult = c(.02, .02)), breaks = seq(0, 100, 25)) + scale_color_manual(values = c("test" = cols_gradient_greyscale[2], "train" = "indianred")) + scale_fill_manual(values = c("test" = cols_gradient_greyscale[2], "train" = "indianred")) + theme_bw(base_size = 15) + theme(aspect.ratio = 1, legend.position = "none") fig_cor2 = df_plot_i |> ggplot(aes_string(x = pred_names[3], y = pred_names[4])) + geom_point(alpha = .2, color = cols_gradient_greyscale[2], size = 3, shape = 19) + geom_point(data = filter(df_plot_i, subset == "train"), alpha = .9, aes(color = subset), size = 3, shape = 19) + stat_boxplot(aes(x = .data[[pred_names[3]]], fill = subset, y = (max(.data[[pred_names[4]]]) - min(.data[[pred_names[4]]])) *.1 + max(.data[[pred_names[4]]])), width = (max(df_plot_q1[[pred_names[4]]]) - min(df_plot_q1[[pred_names[4]]])) * .1, position = position_dodge(width = (max(df_plot_i[[pred_names[4]]]) - min(df_plot_i[[pred_names[4]]])) * .12), outlier.shape = NA) + stat_boxplot(aes(y = .data[[pred_names[4]]], fill = subset, x = (max(.data[[pred_names[3]]]) - min(.data[[pred_names[3]]])) *.1 + max(.data[[pred_names[3]]])), width = (max(df_plot_q1[[pred_names[3]]]) - min(df_plot_q1[[pred_names[3]]])) * .1, position = position_dodge(width = (max(df_plot_i[[pred_names[3]]]) - min(df_plot_i[[pred_names[3]]])) * .12), outlier.shape = NA) + coord_cartesian(clip = "off") + labs(x = title_list[[3]], y = title_list[[4]]) + scale_x_continuous(expand = expansion(mult = c(.02, .02))) + scale_y_continuous(expand = expansion(mult = c(.02, .02))) + scale_color_manual(values = c("test" = cols_gradient_greyscale[2], "train" = "indianred")) + scale_fill_manual(values = c("test" = cols_gradient_greyscale[2], "train" = "indianred")) + theme_bw(base_size = 15) + theme(aspect.ratio = 1, legend.position = "none") fig_1 = ggarrange(metrics_list[[1]], metrics_list[[2]], fig_cor1, widths = c(.5, .5, 1), ncol = 3, nrow = 1) fig_2 = ggarrange(metrics_list[[3]], metrics_list[[4]], fig_cor2, widths = c(.5, .5, 1), ncol = 3, nrow = 1) fig = ggarrange(fig_1, fig_2, ncol = 1, nrow = 2) ggsave(plot = fig, filename = "./figures/Figure_3.jpeg", device = jpeg, width = 15, height = 15, units = "in", dpi = 150, bg = "white", create.dir = TRUE) IRdisplay::display_jpeg(file = "./figures/Figure_3.jpeg") |
Warning message: “`aes_string()` was deprecated in ggplot2 3.0.0. ℹ Please use tidy evaluation idioms with `aes()`. ℹ See also `vignette("ggplot2-in-packages")` for more information.”
Figure 3. Landscape-level variation in forest structural metrics on BCI. (a & b, d & e) landscape-level variation in mean height of canopy returns, canopy cover, coefficient of variation of the leaf-area density profile, and variance of canopy returns at BCI, with one random 16-ha plot shown. (c & f) Scatter plots showing distribution of all pixels and the plot in “structural space”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | # df_plot = df_mdist |> # mutate(c = factor(if_else(sqrt(m_d) > 3, 1, 0))) |> # drop_na(c) plots_train = df_plot_q1 |> filter(subset == "train") fig_hist = ggplot() + #geom_tile(data = df_plot, aes(x = x, y = y), color = "grey70", fill = NA) + geom_histogram(data = df_plot_q1, aes(x = m_d, fill = factor(threshold)), alpha = .8, bins = 100) + scale_fill_manual(values = c("steelblue3", "darkseagreen", "grey80")) + scale_y_continuous(expand = expansion(mult = c(.01, .03))) + theme_bw(base_size = 14) + labs(x = "Mahalanobis distance") + # theme_map + # coord_equal() + scale_x_log10(breaks = c(.2, 2, 20, 200)) + theme(aspect.ratio = .4, legend.position = "none", legend.justification = c(0, 1), panel.background = element_rect(fill = "white"), legend.key = element_rect(fill = "white", color = "white"), legend.title = element_blank(), axis.text.y = element_blank(), axis.title.y = element_blank(), axis.ticks.y = element_blank(), strip.background = element_rect(fill = "white"), # plot.background = element_rect(fill = "white", color = "blue"), legend.direction = "vertical", legend.key.width = unit(.1, "in"), legend.key.height = unit(.1, "in"), legend.key.spacing.y = unit(.05, "in")) + facet_grid(. ~ scenario) fig_map = ggplot() + #geom_tile(data = df_plot, aes(x = x, y = y), color = "grey70", fill = NA) + geom_tile(data = df_plot_q1, aes(x = x, y = y, fill = factor(threshold)), alpha = .8) + geom_tile(data = filter(df_plot_q1, subset == "train"), aes(x = x, y = y), # color = NULL, fill = "indianred", alpha = .8) + scale_fill_manual(values = c("steelblue3", "darkseagreen", "grey80")) + scale_x_continuous(expand = expansion(mult = c(.02, .02))) + scale_y_continuous(expand = expansion(mult = c(.01, .01))) + theme_bw(base_size = 14) + theme_map + coord_equal() + theme(legend.position.inside = c(.02, .985), legend.justification = c(0, 1), legend.direction = "vertical", legend.key.width = unit(.1, "in"), legend.key.height = unit(.1, "in"), legend.key.spacing.y = unit(.05, "in"), panel.background = element_rect(fill = "white"), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(fill = "white", color = "white"), legend.title = element_blank(), # axis.text = element_text(size = 8), axis.text.x = element_blank(), axis.text.y = element_blank()) + facet_grid(. ~ scenario) fig = ggarrange( fig_hist, fig_map, ncol = 1, heights = c(.3075, 1)) # widths = c(1, 1)) ggsave(plot = fig, filename = "./figures/Figure_4.jpeg", device = jpeg, width = 10, height = 12, units = "in", dpi = 150, bg = "white", create.dir = TRUE) IRdisplay::display_jpeg(file = "./figures/Figure_4.jpeg") |
Warning message in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : “conversion failure on '≥ 3 SD' in 'mbcsToSbcs': dot substituted for <e2>” Warning message in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : “conversion failure on '≥ 3 SD' in 'mbcsToSbcs': dot substituted for <89>” Warning message in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : “conversion failure on '≥ 3 SD' in 'mbcsToSbcs': dot substituted for <a5>” Warning message in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : “conversion failure on '≥ 3 SD' in 'mbcsToSbcs': dot substituted for <e2>” Warning message in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : “conversion failure on '≥ 3 SD' in 'mbcsToSbcs': dot substituted for <89>” Warning message in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : “conversion failure on '≥ 3 SD' in 'mbcsToSbcs': dot substituted for <a5>”
Figure 4. Mahalanobis distance and thresholding for plot representativeness. (a) The distribution of Mahalanobis distances generated from a random selection of 10-ha plots. (b) Mahalanobis distances across the landscape.(c) Pixels below the threshold of acceptable distance (2 std) from the distribution of plot data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | options(repr.plot.width = 12, repr.plot.height = 5) fig = df_eval_q1 |> ggplot(aes(x = size_ha, y = value, color = scenario, fill = scenario)) + labs(x = "Area of simulated plot data (ha)") + stat_summary(fun.data = "mean_cl_normal", geom = "ribbon", color = NA, alpha = 0.5, (aes(group = scenario))) + stat_summary(fun = "mean", geom = "line", alpha = 0.8, (aes(group = scenario))) + scale_color_manual(values = c("forestgreen", "orchid4")) + scale_fill_manual(values = c("forestgreen", "orchid4")) + theme_bw(base_size = 15) + theme_map + theme(legend.position.inside = c(.88, .95), legend.justification = c(0, 1), legend.direction = "vertical", legend.key.width = unit(.1, "in"), legend.key.height = unit(.1, "in"), legend.key.spacing.y = unit(.05, "in"), aspect.ratio = 1, axis.title.x = element_text()) + facet_grid2(. ~ metric, independent = "all", scales = "free" ) + facetted_pos_scales( y = list( metric == "Mean Mahalanobis distance" ~ scale_y_continuous(limits = c(1, 7)), metric == "Pixels ≥ 2 SD (%)" ~ scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, .2), labels = seq(0, 100, 20)), metric == "Pixels ≥ 3 SD (%)" ~ scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, .2), labels = seq(0, 100, 20)) ) ) ggsave(plot = fig, filename = "./figures/Figure_5.jpeg", device = jpeg, width = 14, height = 7, units = "in", dpi = 150, bg = "white", create.dir = TRUE) IRdisplay::display_jpeg(file = "./figures/Figure_5.jpeg") |
Figure 5. Relationship between plot area in ha and percent of pixels outside of acceptable distance for (a) Question 1 and (b) Question 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | options(repr.plot.width = 12, repr.plot.height = 5) fig = df_eval_q3 |> ggplot(aes(x = size_ha, y = value, color = scenario, fill = scenario)) + labs(x = "Area of simulated plot data (ha)") + stat_summary(fun.data = "mean_cl_normal", geom = "ribbon", color = NA, alpha = 0.5, (aes(group = scenario))) + stat_summary(fun = "mean", geom = "line", alpha = 0.8, (aes(group = scenario))) + scale_color_manual(values = c("forestgreen", "orchid4")) + scale_fill_manual(values = c("forestgreen", "orchid4")) + theme_bw(base_size = 15) + theme_map + theme(legend.position.inside = c(.88, .95), legend.justification = c(0, 1), legend.direction = "vertical", legend.key.width = unit(.1, "in"), legend.key.height = unit(.1, "in"), legend.key.spacing.y = unit(.05, "in"), aspect.ratio = 1, axis.title.x = element_text()) + facet_grid2(. ~ metric, independent = "all", scales = "free" ) #+ # facetted_pos_scales( # y = list( # metric == "Mean Mahalanobis distance" ~ scale_y_continuous(limits = c(1, 7)), # metric == "Pixels ≥ 2 SD (%)" ~ scale_y_continuous(limits = c(0, 1), # breaks = seq(0, 1, .2), # labels = seq(0, 100, 20)), # metric == "Pixels ≥ 3 SD (%)" ~ scale_y_continuous(limits = c(0, 1), # breaks = seq(0, 1, .2), # labels = seq(0, 100, 20)) # ) # ) ggsave(plot = fig, filename = "./figures/Figure_6.jpeg", device = jpeg, width = 14, height = 7, units = "in", dpi = 150, bg = "white", create.dir = TRUE) IRdisplay::display_jpeg(file = "./figures/Figure_6.jpeg") |
Figure 6. Relationship between plot area in ha and MAE for Question 3.