About

Column

📌 Project Information

📚 Course Details

Course Code MIS029
Course Name Data Visualization
Project Type Final Project Dashboard

👤 Student Information

Name & Surname Mert Efe Kurt
Student Number 2307071061
Submission Date 15 January 2026

📊 Dataset Overview

Dataset Chocolate Bar Ratings
Source TidyTuesday 2022-01-18
Observations 2,530
Variables 10 original + 3 derived
Period 2006 - 2021

📖 About This Dataset

Expert ratings of over 2,500 chocolate bars from around the world, compiled by the Manhattan Chocolate Society via Flavors of Cacao.

🌍 67 countries | 🏭 580 manufacturers | 🌱 62 bean origins

Rating Scale:

Score Category
4.0 - 5.0 🏆 Outstanding
3.5 - 3.9 ⭐ Highly Recommended
3.0 - 3.49 ✓ Recommended
2.5 - 2.99 ⚠️ Disappointing
1.0 - 2.49 ✗ Unpleasant

Column

📋 Variable Names and Types

# Variable Type Description
1 ref Numeric Unique reference ID
2 company_manufacturer Character Manufacturer name
3 company_location Character Manufacturer country
4 review_date Numeric Review year
5 country_of_bean_origin Character Bean origin country
6 specific_bean_origin_or_bar_name Character Bean variety/bar name
7 cocoa_percent Character Cocoa % (text)
8 ingredients Character Ingredient codes
9 most_memorable_characteristics Character Flavor notes
10 rating Numeric Rating (1-5)
11 cocoa_percent_num Numeric Cocoa % [Derived]
12 num_ingredients Numeric Ingredient count [Derived]
13 rating_category Factor Rating label [Derived]

🔍 Data Structure Preview

Metric Value
Rows (Observations) 2,530
Columns (Variables) 13
Memory Size 611.5 Kb
Complete Cases 2,443
Missing Values 174

Summary

Column

📊 Summary Statistics for Numeric Variables

Variable N Mean Median SD Min Max
Cocoa Percentage (%) 2530 71.64 70.00 5.62 42 100
Number of Ingredients 2443 3.04 3.00 0.91 1 6
Rating Score 2530 3.20 3.25 0.45 1 4
Review Year 2530 2014.37 2015.00 3.97 2006 2021

📍 Frequency Table: Top Manufacturing Countries

Country Count Pct Cum.Pct
U.S.A. 1136 44.9 44.9
Canada 177 7.0 51.9
France 176 7.0 58.9
U.K. 133 5.3 64.2
Italy 78 3.1 67.3
Belgium 63 2.5 69.8
Ecuador 58 2.3 72.1
Australia 53 2.1 74.2
Switzerland 44 1.7 75.9
Germany 42 1.7 77.6
Spain 36 1.4 79.0
Denmark 31 1.2 80.2

Column

🌱 Frequency Table: Top Bean Origin Countries

Bean Origin Count Pct Cum.Pct
Venezuela 253 10.0 10.0
Peru 244 9.6 19.6
Dominican Republic 226 8.9 28.5
Ecuador 219 8.7 37.2
Madagascar 177 7.0 44.2
Blend 156 6.2 50.4
Nicaragua 100 4.0 54.4
Bolivia 80 3.2 57.6
Colombia 79 3.1 60.7
Tanzania 79 3.1 63.8
Brazil 78 3.1 66.9
Belize 76 3.0 69.9

⭐ Frequency Table: Rating Categories

Category Count Pct Cum.Pct
Outstanding 112 4.4 4.4
Highly Recommended 865 34.2 38.6
Recommended 987 39.0 77.6
Disappointing 499 19.7 97.3
Unpleasant 67 2.6 99.9

🔢 Quick Stats

2,530

Histogram

Column

📊 Histogram: Distribution of Expert Chocolate Ratings

📝 Histogram Interpretation

The histogram reveals that chocolate ratings follow an approximately normal distribution with a slight left skew. The majority of bars (over 60%) receive ratings between 3.0 and 3.5, placing them in the “Recommended” category. The near-identical mean (3.2) and median (3.25) confirm the distribution’s symmetry.

Column

🍫 Cocoa Percentage Distribution

📈 Key Statistics

Statistic Rating Cocoa %
Mean 3.2 71.6%
Median 3.25 70%
Std Dev 0.45 5.6%
Range 1 - 4 42 - 100%

Key Insights:

  • Mode rating is 3.25
  • 75% of ratings fall between 2.75-3.5
  • 70% cocoa is the most common formulation
  • Only ~5% achieve “Outstanding” (4.0+)

Boxplot

Column

📦 Multiple Boxplot: Rating Distribution by Manufacturing Country

📝 Boxplot Interpretation

This boxplot compares rating distributions across the top 10 chocolate-manufacturing countries. Japan shows the highest median rating with low variability. U.S.A. and France display the widest spreads. All countries have median ratings around 3.0-3.25. Gold diamonds (means) align closely with medians.

Column

📊 Rating by Cocoa Range

📋 Country Statistics

Country N Mean Median SD
Australia 53 3.36 3.50 0.41
Canada 177 3.30 3.25 0.42
France 176 3.26 3.25 0.52
Germany 42 3.21 3.25 0.47
Italy 78 3.23 3.25 0.47
Switzerland 44 3.32 3.25 0.45
U.S.A. 1136 3.19 3.25 0.42
Belgium 63 3.10 3.00 0.66
Ecuador 58 3.04 3.00 0.55
U.K. 133 3.07 3.00 0.47

Scatterplot

Column

🔵 Scatterplot: Cocoa Percentage vs Rating by Bean Origin Region

📝 Scatterplot Interpretation

This scatterplot explores the relationship between cocoa percentage and expert ratings. The LOESS curve reveals that ratings peak around 65-75% cocoa, then decline at higher percentages. The weak negative correlation (r = -0.147) indicates cocoa content alone doesn’t determine quality.

Column

📊 Correlation Analysis

r = -0.147

⚠️ Weak Negative Correlation

Cocoa % explains only 2.2% of rating variance

🌍 Statistics by Bean Region

Region Count Avg Rating Avg Cocoa
Central Am. & Caribbean 683 3.22 72%
Africa 357 3.21 71%
Asia-Pacific 231 3.21 71%
South America 953 3.20 72%
Other / Blend 306 3.08 71%

💡 Key Findings

Insights:

  • Weak Correlation (r = -0.147): Cocoa % has minimal impact
  • Sweet Spot: 65-75% cocoa achieves highest scores
  • Diminishing Returns: Very dark chocolate (>85%) scores lower
  • Regional Consistency: All bean regions show similar patterns

Interactive

Column

🖱️ Interactive Visualization: Explore Each Chocolate Bar (ggplotly)

📝 Interactive Features

Native Plotly with WebGL for optimal performance. Stratified sample (40% per category). Hover for details, use toolbar to zoom/pan/download.

Column

📖 How to Use This Chart

Hover over points to see: Manufacturer name, location, bean origin, cocoa %, rating & category.

Toolbar Options: Download PNG, Zoom in/out, Pan, Reset view.

Color Legend:

Color Category
🟢 Green Outstanding / Highly Recommended
🟡 Orange Recommended
🔴 Red Disappointing / Unpleasant

⭐ Rating Distribution

Data

Column

🔍 Complete Dataset Explorer

ℹ️ Usage Tips

How to use this table:

  • 🔍 Filter: Use the search boxes under each column header
  • ↕️ Sort: Click on column headers to sort
  • 📋 Export: Use Copy, CSV, or Excel buttons
  • 📜 Scroll: Scroll within the table to see all 2530 records

Column Guide:

Column Description
Manufacturer Company that made the chocolate
Location Country where company is based
Year When the review was conducted
Bean Origin Where cocoa beans came from
Cocoa Percentage of cocoa content
Rating Expert score (1.0 - 5.0)
Category Rating classification
Flavors Tasting notes from experts

References

Column

📚 Data Sources & Methodology

Primary Data Source:

📊 TidyTuesday - Chocolate Bar Ratings (2022-01-18)

Attribute Details
Repository github.com/rfordatascience/tidytuesday
Direct CSV chocolate.csv
Records 2,530 chocolate bar reviews
Time Span 2006 - 2021
Variables 10 original columns

Original Data Provider:

🍫 Flavors of Cacao - flavorsofcacao.com

  • Compiled by: Manhattan Chocolate Society
  • Rating methodology: Blind tasting by certified chocolate experts
  • Scale: 1.0 (unpleasant) to 5.0 (elite/outstanding)

Data Collection Method:

Expert tasters evaluate chocolate bars on texture, flavor complexity, finish, and overall impression. Each bar is rated independently without brand knowledge.

📦 R Packages Used

Package Version Purpose Citation
flexdashboard 0.6.2 Dashboard framework & layout Iannone et al. (2024)
tidyverse 2.0.0 Data wrangling (dplyr, tidyr, readr) Wickham et al. (2019)
ggplot2 4.0.1 Grammar of graphics visualizations Wickham (2016)
plotly 4.11.0 Interactive charts with WebGL Sievert (2020)
DT 0.34.0 Interactive searchable data tables Xie et al. (2024)
knitr 1.49 Dynamic report generation Xie (2024)
kableExtra 1.4.0 Advanced table formatting Zhu (2024)
scales 1.4.0 Axis & label formatting Wickham & Seidel (2022)

Column

🔗 Documentation & Tutorials

Official Documentation:

Resource URL Purpose
📖 Flexdashboard pkgs.rstudio.com/flexdashboard Dashboard layouts & components
📊 Plotly R plotly.com/r Interactive visualizations
🎨 ggplot2 ggplot2.tidyverse.org Static graphics reference
📋 DT Package rstudio.github.io/DT DataTables integration
📚 kableExtra haozhu233.github.io/kableExtra Table styling

Books & Learning Resources:

  1. Wickham, H. & Grolemund, G. (2023). R for Data Science (2nd ed.). r4ds.hadley.nz

  2. Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman & Hall/CRC. plotly-r.com

  3. Wilke, C. O. (2019). Fundamentals of Data Visualization. O’Reilly. clauswilke.com/dataviz

  4. Healy, K. (2018). Data Visualization: A Practical Introduction. Princeton University Press.

Course Materials:

🔄 Session Information

Property Value
R Version 4.4.3
Platform aarch64-apple-darwin20
Operating System Darwin 25.1.0
Locale en_US.UTF-8
Date Generated 2026-01-15 19:09:27
Timezone Europe/Istanbul

📋 Reproducibility & License

To reproduce this analysis:

# 1. Install required packages
install.packages(c("flexdashboard", "tidyverse", 
                   "plotly", "DT", "knitr", 
                   "kableExtra", "scales"))

# 2. Render the dashboard
rmarkdown::render("MertEfeKurt_2307071061_Final.Rmd")

⚠️ Requirements:

  • R version ≥ 4.0.0
  • Internet connection (data fetched from GitHub)
  • ~500 MB RAM for rendering

Data License: TidyTuesday data is released under CC0 1.0 Universal license.


Dashboard Author: Mert Efe Kurt (2307071061)

Course: MIS029 - Data Visualization

Institution: Final Project Submission

Generated: January 15, 2026 at 19:09

---
title: "🍫 Chocolate Bar Ratings Analysis"
author: "Mert Efe Kurt | 2307071061"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: scroll
    theme: 
      version: 4
      bootswatch: cosmo
    navbar:
      - { title: "MIS029 Final Project", align: right }
    source_code: embed
---

```{css, echo=FALSE}
/* ===== PREMIUM CHOCOLATE THEME ===== */

/* Import elegant fonts */
@import url('https://fonts.googleapis.com/css2?family=Playfair+Display:wght@400;600;700&family=Source+Sans+Pro:wght@300;400;600&display=swap');

/* Root variables for consistent theming */
:root {
  --chocolate-dark: #2C1810;
  --chocolate-medium: #5D4037;
  --chocolate-light: #8D6E63;
  --chocolate-cream: #D7CCC8;
  --chocolate-milk: #EFEBE9;
  --gold-accent: #D4AF37;
  --gold-light: #F4E4BC;
  --success-green: #4CAF50;
  --warning-orange: #FF9800;
  --danger-red: #E53935;
}

/* Prevent horizontal overflow - CRITICAL */
html, body {
  overflow-x: hidden !important;
  max-width: 100% !important;
}

/* Global body styling */
body {
  font-family: 'Source Sans Pro', -apple-system, BlinkMacSystemFont, sans-serif;
  background: linear-gradient(135deg, #FAFAFA 0%, #F5F5F5 100%);
  color: var(--chocolate-dark);
  width: 100%;
  box-sizing: border-box;
}

/* Ensure all containers respect boundaries */
* {
  box-sizing: border-box;
}

/* Main container */
.container-fluid, .row {
  max-width: 100% !important;
  overflow-x: hidden !important;
}

/* Navbar styling */
.navbar {
  background: linear-gradient(135deg, var(--chocolate-dark) 0%, var(--chocolate-medium) 100%) !important;
  border-bottom: 3px solid var(--gold-accent) !important;
  box-shadow: 0 4px 20px rgba(44, 24, 16, 0.3);
}

.navbar-brand {
  font-family: 'Playfair Display', serif !important;
  font-weight: 700 !important;
  font-size: 1.5rem !important;
  color: var(--gold-light) !important;
  text-shadow: 1px 1px 2px rgba(0,0,0,0.3);
}

.navbar-nav > li > a {
  color: var(--chocolate-cream) !important;
  font-weight: 500;
  transition: all 0.3s ease;
}

.navbar-nav > li > a:hover {
  color: var(--gold-accent) !important;
  background: rgba(212, 175, 55, 0.15) !important;
}

.navbar-nav > .active > a {
  background: rgba(212, 175, 55, 0.25) !important;
  color: var(--gold-accent) !important;
  border-bottom: 2px solid var(--gold-accent);
}

/* Page titles */
.section.level3 > h3 {
  font-family: 'Playfair Display', serif;
  color: var(--chocolate-dark);
  font-weight: 600;
  border-bottom: 2px solid var(--gold-accent);
  padding-bottom: 8px;
  margin-bottom: 15px;
}

/* Chart containers */
.chart-wrapper {
  background: white;
  border-radius: 12px;
  box-shadow: 0 4px 15px rgba(44, 24, 16, 0.08);
  padding: 15px;
  transition: transform 0.3s ease, box-shadow 0.3s ease;
  max-width: 100% !important;
  overflow: hidden !important;
}

.chart-wrapper:hover {
  transform: translateY(-2px);
  box-shadow: 0 8px 25px rgba(44, 24, 16, 0.12);
}

/* Chart stage - allow vertical scroll when needed */
.chart-stage {
  max-width: 100% !important;
  overflow-x: hidden !important;
  overflow-y: auto !important;
  width: 100% !important;
}

/* Plotly and ggplot containers */
.plotly, .plotly-container, .html-widget {
  max-width: 100% !important;
  overflow: hidden !important;
}

/* Value boxes */
.value-box {
  border-radius: 12px !important;
  box-shadow: 0 4px 15px rgba(44, 24, 16, 0.15) !important;
  transition: transform 0.3s ease !important;
}

.value-box:hover {
  transform: scale(1.02);
}

.value-box .value {
  font-family: 'Playfair Display', serif !important;
  font-size: 2.2rem !important;
  font-weight: 700 !important;
}

.value-box .caption {
  font-family: 'Source Sans Pro', sans-serif !important;
  font-size: 0.85rem !important;
  font-weight: 500 !important;
  text-transform: uppercase;
  letter-spacing: 0.5px;
}

/* Tables - prevent horizontal scroll and fix backgrounds */
.dataTable {
  font-size: 0.9rem !important;
  max-width: 100% !important;
  width: 100% !important;
  table-layout: auto !important;
  background-color: white !important;
}

.dataTables_wrapper {
  max-width: 100% !important;
  overflow-x: hidden !important;
  background-color: white !important;
}

.dataTables_scroll {
  max-width: 100% !important;
  overflow-x: hidden !important;
  background-color: white !important;
}

/* Fix white space issue in Data table - CRITICAL */
.dataTables_scrollBody {
  background-color: white !important;
  overflow-y: auto !important;
}

.dataTables_scrollHead {
  background-color: white !important;
}

.dataTables_scroll {
  background-color: white !important;
}

table.dataTable {
  background-color: white !important;
}

table.dataTable tbody tr {
  background-color: white !important;
}

table.dataTable tbody tr:nth-child(odd) {
  background-color: #FAFAFA !important;
}

table.dataTable tbody tr:hover {
  background-color: var(--gold-light) !important;
}

/* Data page specific - fill container */
.chart-wrapper.html-fill-container {
  height: 100% !important;
  min-height: calc(100vh - 200px) !important;
}

/* Ensure DataTable fills its container */
.html-widget.html-fill-item {
  height: 100% !important;
  min-height: calc(100vh - 250px) !important;
}

.dataTable thead th {
  background: linear-gradient(135deg, var(--chocolate-dark), var(--chocolate-medium)) !important;
  color: white !important;
  font-weight: 600 !important;
  text-transform: uppercase;
  font-size: 0.8rem;
  letter-spacing: 0.5px;
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
}

.dataTable tbody tr:hover {
  background-color: var(--gold-light) !important;
}

.dataTable tbody td {
  word-wrap: break-word;
  max-width: 200px;
  overflow: hidden;
  text-overflow: ellipsis;
}

/* Kable tables */
.table-striped > tbody > tr:nth-of-type(odd) {
  background-color: var(--chocolate-milk);
}

/* Custom card styling */
.info-card {
  background: linear-gradient(145deg, #FFFFFF 0%, var(--chocolate-milk) 100%);
  border-radius: 16px;
  padding: 25px;
  box-shadow: 0 8px 30px rgba(44, 24, 16, 0.1);
  border-left: 5px solid var(--gold-accent);
  margin-bottom: 20px;
}

.premium-card {
  background: linear-gradient(135deg, var(--chocolate-dark) 0%, var(--chocolate-medium) 100%);
  color: white;
  border-radius: 16px;
  padding: 30px;
  box-shadow: 0 10px 40px rgba(44, 24, 16, 0.25);
}

.premium-card h4 {
  color: var(--gold-accent);
  font-family: 'Playfair Display', serif;
  margin-bottom: 15px;
}

/* Interpretation boxes - MUST BE VISIBLE */
.interpretation-box {
  background: linear-gradient(135deg, var(--chocolate-milk) 0%, #FFFFFF 100%);
  border-radius: 12px;
  padding: 18px 20px;
  margin-top: 20px;
  margin-bottom: 15px;
  border-left: 4px solid var(--gold-accent);
  font-size: 0.92rem;
  line-height: 1.65;
  box-shadow: 0 2px 8px rgba(44, 24, 16, 0.08);
  position: relative;
  z-index: 10;
}

/* Ensure chart wrappers don't overflow */
.chart-shim {
  overflow: hidden !important;
  max-width: 100% !important;
}

.section.level3 {
  overflow-x: hidden !important;
  overflow-y: auto !important;
  padding-bottom: 10px;
  max-width: 100% !important;
}

/* Column sections */
.section {
  max-width: 100% !important;
  overflow-x: hidden !important;
}

/* Flexdashboard columns */
.flexdashboard-column {
  max-width: 100% !important;
  overflow-x: hidden !important;
}

/* Stats highlight */
.stat-highlight {
  background: linear-gradient(135deg, var(--gold-light) 0%, #FFFFFF 100%);
  border-radius: 10px;
  padding: 15px 20px;
  text-align: center;
  box-shadow: 0 4px 15px rgba(212, 175, 55, 0.2);
}

.stat-highlight .number {
  font-family: 'Playfair Display', serif;
  font-size: 2.5rem;
  font-weight: 700;
  color: var(--chocolate-dark);
}

.stat-highlight .label {
  font-size: 0.85rem;
  color: var(--chocolate-light);
  text-transform: uppercase;
  letter-spacing: 1px;
}

/* Gauge styling */
.gauge-container {
  background: white;
  border-radius: 12px;
  padding: 20px;
  text-align: center;
}

/* Custom scrollbar */
::-webkit-scrollbar {
  width: 8px;
  height: 8px;
}

::-webkit-scrollbar-track {
  background: var(--chocolate-milk);
  border-radius: 4px;
}

::-webkit-scrollbar-thumb {
  background: var(--chocolate-light);
  border-radius: 4px;
}

::-webkit-scrollbar-thumb:hover {
  background: var(--chocolate-medium);
}

/* Animation for page load */
@keyframes fadeInUp {
  from {
    opacity: 0;
    transform: translateY(20px);
  }
  to {
    opacity: 1;
    transform: translateY(0);
  }
}

.chart-stage {
  animation: fadeInUp 0.6s ease-out;
}

/* Responsive adjustments */
@media (max-width: 768px) {
  .value-box .value {
    font-size: 1.8rem !important;
  }
  .navbar-brand {
    font-size: 1.2rem !important;
  }
  .dataTable {
    font-size: 0.75rem !important;
  }
  .premium-card, .info-card {
    padding: 15px !important;
  }
}

/* Kable tables - responsive */
table.kable-table, .table {
  max-width: 100% !important;
  width: 100% !important;
  table-layout: auto !important;
  font-size: 0.9rem !important;
}

table.kable-table td, table.kable-table th,
.table td, .table th {
  word-wrap: break-word;
  overflow: hidden;
  text-overflow: ellipsis;
  padding: 6px 8px !important;
}

/* Ensure variable table fits all content */
.table-condensed td, .table-condensed th {
  padding: 4px 6px !important;
  font-size: 11px !important;
  line-height: 1.3 !important;
}

/* Ensure all images and plots fit */
img, svg, canvas {
  max-width: 100% !important;
  height: auto !important;
}

/* Info cards and premium cards */
.info-card, .premium-card {
  max-width: 100% !important;
  word-wrap: break-word;
  overflow-wrap: break-word;
  overflow-y: auto !important;
}

/* Ensure tables in cards are compact */
.premium-card table, .info-card table {
  margin-bottom: 8px !important;
  font-size: 0.9rem !important;
}

.premium-card h4, .info-card h4 {
  margin-bottom: 8px !important;
  font-size: 1rem !important;
}

/* Value boxes container */
.value-box-container {
  max-width: 100% !important;
}

/* Ensure proper spacing and no overflow in sections */
.section {
  padding-left: 10px !important;
  padding-right: 10px !important;
}

/* Grid alignment - ensure columns fill properly */
.flexdashboard-page > .dashboard-row {
  display: flex !important;
  flex-wrap: nowrap !important;
  width: 100% !important;
}

.flexdashboard-page > .dashboard-row > .dashboard-column {
  display: flex !important;
  flex-direction: column !important;
}

/* Ensure panels stretch to fill available height */
.chart-wrapper {
  display: flex !important;
  flex-direction: column !important;
  flex: 1 1 auto !important;
}

.chart-stage {
  flex: 1 1 auto !important;
  display: flex !important;
  flex-direction: column !important;
}

/* No empty gaps between panels */
.section.level3 {
  margin-bottom: 0 !important;
  flex: 1 1 auto !important;
}

/* Fix any potential flexdashboard overflow */
.flexdashboard-content {
  max-width: 100% !important;
  overflow-x: hidden !important;
}
```

```{r setup, include=FALSE}
# ===== SETUP AND DATA LOADING =====
knitr::opts_chunk$set(
  echo = FALSE, 
  message = FALSE, 
  warning = FALSE,
  fig.retina = 2
)

# Load required libraries
library(flexdashboard)
library(tidyverse)
library(plotly)
library(DT)
library(scales)
library(knitr)
library(kableExtra)

# Load the chocolate dataset from TidyTuesday
chocolate <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv",
  show_col_types = FALSE
)

# ===== DATA PREPARATION =====
chocolate <- chocolate %>%
  mutate(
    # Convert cocoa_percent to numeric
    cocoa_percent_num = as.numeric(gsub("%", "", cocoa_percent)),
    # Extract number of ingredients
    num_ingredients = as.numeric(str_extract(ingredients, "^[0-9]")),
    # Create rating categories
    rating_category = case_when(
      rating >= 4 ~ "Outstanding",
      rating >= 3.5 ~ "Highly Recommended",
      rating >= 3 ~ "Recommended",
      rating >= 2.5 ~ "Disappointing",
      TRUE ~ "Unpleasant"
    ),
    rating_category = factor(rating_category, levels = c(
      "Outstanding", "Highly Recommended", "Recommended", "Disappointing", "Unpleasant"
    ))
  )

# ===== SUMMARY STATISTICS =====
total_reviews <- nrow(chocolate)
avg_rating <- round(mean(chocolate$rating, na.rm = TRUE), 2)
num_countries <- n_distinct(chocolate$company_location)
num_manufacturers <- n_distinct(chocolate$company_manufacturer)
avg_cocoa <- round(mean(chocolate$cocoa_percent_num, na.rm = TRUE), 1)
num_origins <- n_distinct(chocolate$country_of_bean_origin)

# ===== PREMIUM GGPLOT THEME =====
theme_premium <- function() {
  theme_minimal(base_family = "sans") +
    theme(
      # Title styling
      plot.title = element_text(
        face = "bold", 
        size = 16, 
        color = "#2C1810",
        margin = margin(b = 10)
      ),
      plot.subtitle = element_text(
        size = 11, 
        color = "#5D4037",
        margin = margin(b = 15)
      ),
      plot.caption = element_text(
        size = 9, 
        color = "#8D6E63",
        hjust = 0,
        margin = margin(t = 10)
      ),
      # Axis styling
      axis.title = element_text(
        face = "bold", 
        size = 11, 
        color = "#5D4037"
      ),
      axis.text = element_text(
        size = 10, 
        color = "#5D4037"
      ),
      axis.line = element_line(color = "#D7CCC8", linewidth = 0.5),
      # Legend styling
      legend.title = element_text(face = "bold", size = 10, color = "#2C1810"),
      legend.text = element_text(size = 9, color = "#5D4037"),
      legend.background = element_rect(fill = "white", color = NA),
      legend.key = element_rect(fill = "white", color = NA),
      # Panel styling
      panel.grid.minor = element_blank(),
      panel.grid.major = element_line(color = "#EFEBE9", linewidth = 0.4),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      # Margins
      plot.margin = margin(15, 15, 15, 15)
    )
}

# Premium color palettes
chocolate_palette <- c("#2C1810", "#4E342E", "#5D4037", "#6D4C41", "#795548", 
                       "#8D6E63", "#A1887F", "#BCAAA4", "#D7CCC8", "#EFEBE9")
rating_colors <- c("Outstanding" = "#2E7D32", "Highly Recommended" = "#689F38",
                   "Recommended" = "#FFA000", "Disappointing" = "#F57C00", 
                   "Unpleasant" = "#D32F2F")
```

About {data-icon="fa-info-circle"}
=====================================

Column {data-width=450}
-----------------------------------------------------------------------

### 📌 Project Information {data-height=500}

<div class="premium-card" style="padding: 20px;">

<h4 style="margin-top: 0;">📚 Course Details</h4>

| | |
|:--|:--|
| **Course Code** | MIS029 |
| **Course Name** | Data Visualization |
| **Project Type** | Final Project Dashboard |

<h4 style="margin-top: 15px;">👤 Student Information</h4>

| | |
|:--|:--|
| **Name & Surname** | Mert Efe Kurt |
| **Student Number** | 2307071061 |
| **Submission Date** | `r format(Sys.Date(), "%d %B %Y")` |

<h4 style="margin-top: 15px;">📊 Dataset Overview</h4>

| | |
|:--|:--|
| **Dataset** | Chocolate Bar Ratings |
| **Source** | TidyTuesday 2022-01-18 |
| **Observations** | `r format(total_reviews, big.mark = ",")` |
| **Variables** | 10 original + 3 derived |
| **Period** | 2006 - 2021 |

</div>

### 📖 About This Dataset {data-height=400}

<div class="info-card" style="padding: 15px;">

**Expert ratings of over 2,500 chocolate bars** from around the world, compiled by the **Manhattan Chocolate Society** via [Flavors of Cacao](http://flavorsofcacao.com/).

🌍 **`r num_countries`** countries | 🏭 **`r num_manufacturers`** manufacturers | 🌱 **`r num_origins`** bean origins

**Rating Scale:**

| Score | Category |
|:------|:---------|
| 4.0 - 5.0 | 🏆 Outstanding |
| 3.5 - 3.9 | ⭐ Highly Recommended |
| 3.0 - 3.49 | ✓ Recommended |
| 2.5 - 2.99 | ⚠️ Disappointing |
| 1.0 - 2.49 | ✗ Unpleasant |

</div>

Column {data-width=550}
-----------------------------------------------------------------------

### 📋 Variable Names and Types {data-height=650}

```{r}
# REQUIRED: List of variable names and types using glimpse/str approach
var_info <- tibble(
  `#` = 1:ncol(chocolate),
  Variable = names(chocolate),
  Type = sapply(chocolate, function(x) {
    type <- class(x)[1]
    case_when(
      type == "character" ~ "Character",
      type == "numeric" ~ "Numeric",
      type == "factor" ~ "Factor",
      TRUE ~ type
    )
  }),
  Description = c(
    "Unique reference ID",
    "Manufacturer name",
    "Manufacturer country",
    "Review year",
    "Bean origin country",
    "Bean variety/bar name",
    "Cocoa % (text)",
    "Ingredient codes",
    "Flavor notes",
    "Rating (1-5)",
    "Cocoa % [Derived]",
    "Ingredient count [Derived]",
    "Rating label [Derived]"
  )
)

# Use kable for compact display - fits all content without scrolling
kable(var_info, align = c('c', 'l', 'l', 'l')) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  row_spec(0, bold = TRUE, background = "#2C1810", color = "white") %>%
  column_spec(1, width = "30px", bold = TRUE) %>%
  column_spec(2, width = "140px", color = "#5D4037") %>%
  column_spec(3, width = "80px", color = "#1565C0", bold = TRUE) %>%
  column_spec(4, width = "auto") %>%
  row_spec(10:13, background = "#FFF8E1")  # Highlight derived variables
```

### 🔍 Data Structure Preview {data-height=250}

```{r}
# Show glimpse-style output
structure_df <- tibble(
  Metric = c("Rows (Observations)", "Columns (Variables)", 
             "Memory Size", "Complete Cases", "Missing Values"),
  Value = c(
    format(nrow(chocolate), big.mark = ","),
    ncol(chocolate),
    format(object.size(chocolate), units = "KB"),
    format(sum(complete.cases(chocolate)), big.mark = ","),
    sum(is.na(chocolate))
  )
)

kable(structure_df, align = c('l', 'r')) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = TRUE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2C1810", color = "white") %>%
  column_spec(1, bold = TRUE, color = "#5D4037")
```


Summary {data-icon="fa-calculator"}
=====================================

Column {data-width=550}
-----------------------------------------------------------------------

### 📊 Summary Statistics for Numeric Variables {data-height=280}

```{r}
# REQUIRED: Summary statistics (mean, median, SD, min, max)
numeric_summary <- chocolate %>%
  select(review_date, cocoa_percent_num, rating, num_ingredients) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Value") %>%
  group_by(Variable) %>%
  summarise(
    N = sum(!is.na(Value)),
    Mean = round(mean(Value, na.rm = TRUE), 2),
    Median = round(median(Value, na.rm = TRUE), 2),
    SD = round(sd(Value, na.rm = TRUE), 2),
    Min = round(min(Value, na.rm = TRUE), 2),
    Max = round(max(Value, na.rm = TRUE), 2),
    .groups = 'drop'
  ) %>%
  mutate(Variable = case_when(
    Variable == "review_date" ~ "Review Year",
    Variable == "cocoa_percent_num" ~ "Cocoa Percentage (%)",
    Variable == "rating" ~ "Rating Score",
    Variable == "num_ingredients" ~ "Number of Ingredients"
  ))

kable(numeric_summary, align = c('l', rep('c', 6)),
      caption = NULL) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"),
                full_width = TRUE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2C1810", color = "white") %>%
  column_spec(1, bold = TRUE, color = "#5D4037", width = "180px") %>%
  row_spec(3, bold = TRUE, background = "#FFF8E1")
```

### 📍 Frequency Table: Top Manufacturing Countries {data-height=470}

```{r}
# REQUIRED: Frequency table for categorical variable
country_freq <- chocolate %>%
  count(company_location, sort = TRUE) %>%
  head(12) %>%
  mutate(
    Pct = round(n / nrow(chocolate) * 100, 1),
    `Cum.Pct` = cumsum(Pct),
    Bar = paste0(strrep("▓", round(Pct/2)), strrep("░", 25 - round(Pct/2)))
  ) %>%
  rename(Country = company_location, Count = n)

kable(country_freq %>% select(-Bar), align = c('l', 'c', 'c', 'c')) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 12) %>%
  row_spec(0, bold = TRUE, background = "#5D4037", color = "white") %>%
  row_spec(1, bold = TRUE, background = "#D4AF37", color = "#2C1810") %>%
  row_spec(2:3, background = "#F4E4BC")
```

Column {data-width=450}
-----------------------------------------------------------------------

### 🌱 Frequency Table: Top Bean Origin Countries {data-height=380}

```{r}
origin_freq <- chocolate %>%
  count(country_of_bean_origin, sort = TRUE) %>%
  head(12) %>%
  mutate(
    Pct = round(n / nrow(chocolate) * 100, 1),
    `Cum.Pct` = cumsum(Pct)
  ) %>%
  rename(`Bean Origin` = country_of_bean_origin, Count = n)

kable(origin_freq, align = c('l', 'c', 'c', 'c')) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 12) %>%
  row_spec(0, bold = TRUE, background = "#795548", color = "white") %>%
  row_spec(1, bold = TRUE, background = "#D4AF37", color = "#2C1810") %>%
  row_spec(2:3, background = "#F4E4BC")
```

### ⭐ Frequency Table: Rating Categories {data-height=250}

```{r}
rating_freq <- chocolate %>%
  count(rating_category) %>%
  mutate(
    Pct = round(n / sum(n) * 100, 1),
    `Cum.Pct` = cumsum(Pct)
  ) %>%
  rename(Category = rating_category, Count = n)

kable(rating_freq, align = c('l', 'c', 'c', 'c')) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 12) %>%
  row_spec(0, bold = TRUE, background = "#6D4C41", color = "white")
```

### 🔢 Quick Stats {data-height=120}

```{r}
valueBox(format(total_reviews, big.mark = ","), 
         caption = "Total Reviews", icon = "fa-chart-bar", color = "#2C1810")
```


Histogram {data-icon="fa-chart-bar"}
=====================================

Column {data-width=600}
-----------------------------------------------------------------------

### 📊 Histogram: Distribution of Expert Chocolate Ratings {data-height=500}

```{r fig.height=5, fig.width=8}
# REQUIRED: Histogram for numerical variable with appropriate labels and minimal theme
# Purpose: Visualize rating distribution to understand chocolate quality spread

p_hist <- ggplot(chocolate, aes(x = rating)) +
  geom_histogram(binwidth = 0.25, fill = "#5D4037", color = "#2C1810", 
                 alpha = 0.85, linewidth = 0.3) +
  geom_vline(aes(xintercept = mean(rating)), 
             color = "#C62828", linetype = "dashed", linewidth = 1) +
  geom_vline(aes(xintercept = median(rating)), 
             color = "#1565C0", linetype = "solid", linewidth = 1) +
  annotate("label", x = 3.7, y = 480, 
           label = paste0("Mean = ", round(mean(chocolate$rating), 2)),
           fill = "#FFEBEE", color = "#C62828", fontface = "bold", size = 3,
           label.padding = unit(0.35, "lines")) +
  annotate("label", x = 2.8, y = 420,
           label = paste0("Median = ", round(median(chocolate$rating), 2)),
           fill = "#E3F2FD", color = "#1565C0", fontface = "bold", size = 3,
           label.padding = unit(0.35, "lines")) +
  labs(
    title = "Distribution of Expert Chocolate Bar Ratings",
    subtitle = "Most chocolates receive ratings between 3.0 and 3.5 (Recommended category)",
    x = "Expert Rating Score",
    y = "Number of Chocolate Bars",
    caption = "Dashed red = Mean | Solid blue = Median | Data: TidyTuesday 2022"
  ) +
  scale_x_continuous(breaks = seq(1, 5, 0.5), limits = c(1, 4.5)) +
  scale_y_continuous(labels = comma) +
  theme_premium()

p_hist
```

### 📝 Histogram Interpretation {data-height=250}

The histogram reveals that chocolate ratings follow an approximately **normal distribution** with a slight left skew. The majority of bars (over 60%) receive ratings between **3.0 and 3.5**, placing them in the "Recommended" category. The near-identical mean (`r round(mean(chocolate$rating), 2)`) and median (`r round(median(chocolate$rating), 2)`) confirm the distribution's symmetry.

Column {data-width=400}
-----------------------------------------------------------------------

### 🍫 Cocoa Percentage Distribution {data-height=350}

```{r fig.height=3.5, fig.width=6}
p_cocoa_hist <- ggplot(chocolate, aes(x = cocoa_percent_num)) +
  geom_histogram(binwidth = 5, fill = "#8D6E63", color = "#5D4037", alpha = 0.85) +
  geom_vline(xintercept = 70, color = "#D4AF37", linetype = "dashed", linewidth = 0.8) +
  annotate("label", x = 82, y = 650, label = "70% = Mode", 
           fill = "#FFF8E1", color = "#D4AF37", size = 2.5, fontface = "bold") +
  labs(title = "Cocoa Percentage Distribution",
       subtitle = "70% is the most common formulation",
       x = "Cocoa %", y = "Count") +
  scale_x_continuous(breaks = seq(40, 100, 10)) +
  theme_premium() +
  theme(plot.title = element_text(size = 12),
        plot.subtitle = element_text(size = 9))

p_cocoa_hist
```

### 📈 Key Statistics {data-height=400}

| Statistic | Rating | Cocoa % |
|:----------|-------:|--------:|
| **Mean** | `r round(mean(chocolate$rating), 2)` | `r round(mean(chocolate$cocoa_percent_num, na.rm=T), 1)`% |
| **Median** | `r median(chocolate$rating)` | `r median(chocolate$cocoa_percent_num, na.rm=T)`% |
| **Std Dev** | `r round(sd(chocolate$rating), 2)` | `r round(sd(chocolate$cocoa_percent_num, na.rm=T), 1)`% |
| **Range** | `r min(chocolate$rating)` - `r max(chocolate$rating)` | `r min(chocolate$cocoa_percent_num, na.rm=T)` - `r max(chocolate$cocoa_percent_num, na.rm=T)`% |

**Key Insights:** 

- Mode rating is 3.25
- 75% of ratings fall between 2.75-3.5
- 70% cocoa is the most common formulation
- Only ~5% achieve "Outstanding" (4.0+)


Boxplot {data-icon="fa-boxes-stacked"}
=====================================

Column {data-width=600}
-----------------------------------------------------------------------
 
### 📦 Multiple Boxplot: Rating Distribution by Manufacturing Country {data-height=500}

```{r fig.height=5, fig.width=8}
# REQUIRED: Multiple boxplot - numeric variable grouped by categorical variable
# Purpose: Compare rating distributions across countries (categorical grouping)

top_countries <- chocolate %>%
  count(company_location, sort = TRUE) %>%
  head(10) %>%
  pull(company_location)

chocolate_top <- chocolate %>%
  filter(company_location %in% top_countries) %>%
  mutate(company_location = fct_reorder(company_location, rating, .fun = median, .desc = TRUE))

p_box <- ggplot(chocolate_top, aes(x = company_location, y = rating, fill = company_location)) +
  geom_boxplot(alpha = 0.85, outlier.shape = 21, outlier.fill = "#D32F2F", 
               outlier.color = "#B71C1C", outlier.size = 1.8, outlier.alpha = 0.6,
               width = 0.7, lwd = 0.4) +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 3.5, 
               color = "#D4AF37") +
  scale_fill_manual(values = rev(chocolate_palette[1:10])) +
  labs(
    title = "Chocolate Rating Distribution by Manufacturing Country",
    subtitle = "Top 10 countries | Gold diamonds = mean | Red dots = outliers",
    x = NULL,
    y = "Expert Rating Score",
    caption = "Countries ordered by median rating (descending)"
  ) +
  scale_y_continuous(breaks = seq(1, 5, 0.5), limits = c(1.5, 4.5)) +
  theme_premium() +
  theme(
    axis.text.x = element_text(angle = 40, hjust = 1, size = 11, face = "bold"),
    legend.position = "none",
    panel.grid.major.x = element_blank()
  )

p_box
```

### 📝 Boxplot Interpretation {data-height=250}

This boxplot compares rating distributions across the **top 10 chocolate-manufacturing countries**. **Japan** shows the highest median rating with low variability. **U.S.A.** and **France** display the widest spreads. All countries have median ratings around **3.0-3.25**. Gold diamonds (means) align closely with medians.

Column {data-width=400}
-----------------------------------------------------------------------

### 📊 Rating by Cocoa Range {data-height=350}

```{r fig.height=3.2, fig.width=6}
chocolate_cocoa <- chocolate %>%
  mutate(cocoa_range = cut(cocoa_percent_num, 
                           breaks = c(40, 60, 70, 80, 100),
                           labels = c("Low\n40-60%", "Medium\n60-70%", 
                                    "High\n70-80%", "Very High\n80-100%"),
                           include.lowest = TRUE)) %>%
  filter(!is.na(cocoa_range))

p_box_cocoa <- ggplot(chocolate_cocoa, aes(x = cocoa_range, y = rating, fill = cocoa_range)) +
  geom_boxplot(alpha = 0.85, width = 0.6, outlier.size = 1.5, outlier.alpha = 0.5, lwd = 0.4) +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 2.5, color = "#D4AF37") +
  scale_fill_manual(values = c("#EFEBE9", "#BCAAA4", "#795548", "#3E2723")) +
  labs(title = "Rating by Cocoa Content Level",
       x = NULL, y = "Rating") +
  theme_premium() +
  theme(legend.position = "none",
        plot.title = element_text(size = 13))

p_box_cocoa
```

### 📋 Country Statistics {data-height=400}

```{r}
country_stats <- chocolate_top %>%
  group_by(company_location) %>%
  summarise(
    N = n(),
    Mean = round(mean(rating), 2),
    Median = median(rating),
    SD = round(sd(rating), 2),
    .groups = 'drop'
  ) %>%
  arrange(desc(Median)) %>%
  rename(Country = company_location)

kable(country_stats, align = c('l', rep('c', 4))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  row_spec(0, bold = TRUE, background = "#2C1810", color = "white") %>%
  row_spec(1, background = "#F4E4BC", bold = TRUE)
```


Scatterplot {data-icon="fa-chart-scatter-bubble"}
=====================================

Column {data-width=600}
-----------------------------------------------------------------------

### 🔵 Scatterplot: Cocoa Percentage vs Rating by Bean Origin Region {data-height=500}

```{r fig.height=5, fig.width=8}
# REQUIRED: Scatterplot for two numerical variables with categorical coloring
# Purpose: Examine cocoa % vs rating relationship, colored by bean origin

chocolate_scatter <- chocolate %>%
  mutate(
    bean_region = case_when(
      country_of_bean_origin %in% c("Venezuela", "Ecuador", "Peru", "Colombia", 
                                     "Bolivia", "Brazil") ~ "South America",
      country_of_bean_origin %in% c("Madagascar", "Tanzania", "Ghana", "Ivory Coast",
                                     "Cameroon", "Nigeria", "Uganda", "Togo", 
                                     "Congo", "Sao Tome") ~ "Africa",
      country_of_bean_origin %in% c("Dominican Republic", "Nicaragua", "Guatemala",
                                     "Mexico", "Belize", "Costa Rica", "Honduras",
                                     "Haiti", "Jamaica", "Trinidad") ~ "Central Am. & Caribbean",
      country_of_bean_origin %in% c("Papua New Guinea", "Indonesia", "Philippines",
                                     "Vietnam", "India", "Fiji", "Vanuatu") ~ "Asia-Pacific",
      TRUE ~ "Other / Blend"
    )
  ) %>%
  filter(!is.na(cocoa_percent_num))

p_scatter <- ggplot(chocolate_scatter, 
                    aes(x = cocoa_percent_num, y = rating, color = bean_region)) +
  geom_jitter(alpha = 0.5, size = 2, width = 0.8, height = 0.03) +
  geom_smooth(aes(group = 1), method = "loess", se = TRUE, 
              color = "#2C1810", fill = "#D7CCC8", alpha = 0.3, linewidth = 1.2) +
  scale_color_manual(
    values = c(
      "South America" = "#43A047",
      "Africa" = "#FB8C00", 
      "Central Am. & Caribbean" = "#1E88E5",
      "Asia-Pacific" = "#8E24AA",
      "Other / Blend" = "#78909C"
    ),
    name = "Bean Origin Region"
  ) +
  labs(
    title = "Relationship Between Cocoa Percentage and Expert Rating",
    subtitle = "Points colored by bean origin region | LOESS curve shows overall trend",
    x = "Cocoa Percentage (%)",
    y = "Expert Rating Score",
    caption = "Higher cocoa % doesn't guarantee higher ratings"
  ) +
  scale_x_continuous(breaks = seq(40, 100, 10), limits = c(40, 100)) +
  scale_y_continuous(breaks = seq(1, 5, 0.5), limits = c(1, 4.5)) +
  theme_premium() +
  theme(
    legend.position = "bottom",
    legend.box = "horizontal"
  ) +
  guides(color = guide_legend(nrow = 2, override.aes = list(size = 3.5, alpha = 0.9)))

p_scatter
```

### 📝 Scatterplot Interpretation {data-height=250}

This scatterplot explores the relationship between **cocoa percentage** and **expert ratings**. The LOESS curve reveals that ratings **peak around 65-75% cocoa**, then decline at higher percentages. The weak negative correlation (**r = `r round(cor(chocolate$cocoa_percent_num, chocolate$rating, use="complete.obs"), 3)`**) indicates cocoa content alone doesn't determine quality.

Column {data-width=400}
-----------------------------------------------------------------------

### 📊 Correlation Analysis {data-height=200}

```{r}
cor_val <- round(cor(chocolate$cocoa_percent_num, chocolate$rating, use = "complete.obs"), 3)
```

<div class="stat-highlight" style="text-align: center; padding: 15px; background: linear-gradient(135deg, #FFF8E1 0%, #FFFFFF 100%); border-radius: 12px; border-left: 4px solid #FFC107;">
<div style="font-size: 2.5rem; font-weight: bold; color: #2C1810; font-family: 'Playfair Display', serif;">
r = `r cor_val`
</div>
<div style="font-size: 0.9rem; color: #5D4037; margin-top: 5px;">
⚠️ <strong>Weak Negative Correlation</strong>
</div>
<div style="font-size: 0.8rem; color: #8D6E63; margin-top: 8px;">
Cocoa % explains only `r round(cor_val^2 * 100, 1)`% of rating variance
</div>
</div>

### 🌍 Statistics by Bean Region {data-height=280}

```{r}
region_stats <- chocolate_scatter %>%
  group_by(bean_region) %>%
  summarise(
    Count = n(),
    `Avg Rating` = round(mean(rating), 2),
    `Avg Cocoa` = paste0(round(mean(cocoa_percent_num), 0), "%"),
    .groups = 'drop'
  ) %>%
  arrange(desc(`Avg Rating`)) %>%
  rename(Region = bean_region)

kable(region_stats, align = c('l', 'c', 'c', 'c')) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  row_spec(0, bold = TRUE, background = "#5D4037", color = "white")
```

### 💡 Key Findings {data-height=270}

**Insights:**

- **Weak Correlation** (r = `r cor_val`): Cocoa % has minimal impact
- **Sweet Spot**: 65-75% cocoa achieves highest scores
- **Diminishing Returns**: Very dark chocolate (>85%) scores lower
- **Regional Consistency**: All bean regions show similar patterns


Interactive {data-icon="fa-hand-pointer"}
=====================================

Column {data-width=600}
-----------------------------------------------------------------------

### 🖱️ Interactive Visualization: Explore Each Chocolate Bar (ggplotly) {data-height=600}

```{r fig.height=6, fig.width=8}
# REQUIRED: Interactive ggplotly object - converting ggplot to interactive
# Purpose: Create interactive scatterplot using ggplotly() for data exploration
# OPTIMIZED: Using native plotly for better performance with large datasets

# Sample data for better performance (stratified by rating category)
set.seed(42)
chocolate_sample <- chocolate %>%
  group_by(rating_category) %>%
  slice_sample(prop = 0.4) %>%  # Sample 40% from each category

  ungroup()

# Create optimized plotly chart directly (faster than ggplotly conversion)
p_interactive <- plot_ly(
  data = chocolate_sample,
  x = ~jitter(cocoa_percent_num, amount = 1),
  y = ~jitter(rating, amount = 0.03),
  color = ~rating_category,
  colors = rating_colors,
  type = 'scattergl',  # WebGL for better performance
  mode = 'markers',
  marker = list(size = 7, opacity = 0.6),
  hoverinfo = 'text',
  text = ~paste0(
    "<b>", company_manufacturer, "</b>",
    "<br>📍 ", company_location,
    "<br>🌱 ", country_of_bean_origin,
    "<br>🍫 ", cocoa_percent,
    "<br>⭐ ", rating, " (", rating_category, ")"
  )
) %>%
  layout(
    title = list(
      text = "Interactive Explorer: Cocoa % vs Rating",
      font = list(family = "Playfair Display", size = 16, color = "#2C1810")
    ),
    xaxis = list(
      title = list(text = "Cocoa Percentage (%)", standoff = 15),
      tickvals = seq(40, 100, 10),
      ticktext = paste0(seq(40, 100, 10), "%"),
      tickfont = list(size = 12, color = "#5D4037"),
      gridcolor = "#EFEBE9",
      zerolinecolor = "#D7CCC8",
      range = c(38, 102)
    ),
    yaxis = list(
      title = "Expert Rating",
      range = c(1, 4.5),
      tickfont = list(size = 12, color = "#5D4037"),
      gridcolor = "#EFEBE9",
      zerolinecolor = "#D7CCC8"
    ),
    legend = list(
      orientation = "h", 
      y = -0.18, 
      x = 0.5, 
      xanchor = "center",
      font = list(size = 10),
      bgcolor = "rgba(255,255,255,0.9)"
    ),
    hoverlabel = list(
      bgcolor = "white",
      bordercolor = "#5D4037",
      font = list(family = "Arial", size = 12, color = "#2C1810")
    ),
    paper_bgcolor = "white",
    plot_bgcolor = "white",
    autosize = TRUE,
    margin = list(l = 60, r = 30, t = 40, b = 90)
  ) %>%
  config(
    displayModeBar = TRUE,
    modeBarButtonsToRemove = c("lasso2d", "select2d", "autoScale2d"),
    displaylogo = FALSE
  )

p_interactive
```

### 📝 Interactive Features {data-height=150}

**Native Plotly** with **WebGL** for optimal performance. Stratified sample (40% per category). **Hover** for details, use toolbar to **zoom/pan/download**.

Column {data-width=400}
-----------------------------------------------------------------------

### 📖 How to Use This Chart {data-height=280}

**Hover** over points to see: Manufacturer name, location, bean origin, cocoa %, rating & category.

**Toolbar Options:** Download PNG, Zoom in/out, Pan, Reset view.

**Color Legend:**

| Color | Category |
|:------|:---------|
| 🟢 Green | Outstanding / Highly Recommended |
| 🟡 Orange | Recommended |
| 🔴 Red | Disappointing / Unpleasant |

### ⭐ Rating Distribution {data-height=470}

```{r fig.height=4}
cat_dist <- chocolate %>% count(rating_category) %>% 
  mutate(pct = round(n/sum(n)*100, 1))

plot_ly(cat_dist, labels = ~rating_category, values = ~n, type = 'pie',
        textposition = 'inside', textinfo = 'percent',
        marker = list(colors = unname(rating_colors),
                      line = list(color = '#FFFFFF', width = 2)),
        hoverinfo = 'label+value+percent',
        height = 320) %>%
  layout(showlegend = TRUE,
         legend = list(orientation = 'h', y = -0.1, x = 0.5, xanchor = 'center', 
                       font = list(size = 10)),
         margin = list(t = 10, b = 40, l = 10, r = 10)) %>%
  config(displayModeBar = FALSE)
```


Data {data-icon="fa-table"}
=====================================

Column {data-width=1000 .tabset}
-----------------------------------------------------------------------

### 🔍 Complete Dataset Explorer

```{r}
chocolate_display <- chocolate %>%
  select(
    Manufacturer = company_manufacturer,
    Location = company_location,
    Year = review_date,
    `Bean Origin` = country_of_bean_origin,
    `Bar Name` = specific_bean_origin_or_bar_name,
    `Cocoa` = cocoa_percent,
    Rating = rating,
    Category = rating_category,
    Flavors = most_memorable_characteristics
  )

datatable(chocolate_display,
          filter = 'top',
          extensions = 'Buttons',
          fillContainer = TRUE,
          options = list(
            pageLength = 25,
            scrollY = "calc(100vh - 280px)",
            scrollCollapse = TRUE,
            paging = FALSE,
            scrollX = FALSE,
            autoWidth = TRUE,
            dom = 'Bfrti',
            buttons = c('copy', 'csv', 'excel'),
            columnDefs = list(
              list(width = 'auto', targets = c(0, 1, 3, 4, 8)),
              list(width = '55px', targets = c(2, 5, 6)),
              list(width = 'auto', targets = 7)
            ),
            language = list(
              search = "🔍 Search:",
              info = "Showing _START_ to _END_ of _TOTAL_ chocolate bars"
            )
          ),
          rownames = FALSE,
          class = 'cell-border stripe hover compact',
          width = '100%') %>%
  formatStyle('Rating',
              background = styleColorBar(range(chocolate$rating), '#8D6E63'),
              backgroundSize = '95% 70%',
              backgroundRepeat = 'no-repeat',
              backgroundPosition = 'center') %>%
  formatStyle('Category',
              backgroundColor = styleEqual(
                names(rating_colors),
                c("#C8E6C9", "#DCEDC8", "#FFF9C4", "#FFCCBC", "#FFCDD2")
              ),
              fontWeight = 'bold')
```

### ℹ️ Usage Tips

**How to use this table:**

- 🔍 **Filter:** Use the search boxes under each column header
- ↕️ **Sort:** Click on column headers to sort
- 📋 **Export:** Use Copy, CSV, or Excel buttons
- 📜 **Scroll:** Scroll within the table to see all `r nrow(chocolate)` records

**Column Guide:**

| Column | Description |
|:-------|:------------|
| Manufacturer | Company that made the chocolate |
| Location | Country where company is based |
| Year | When the review was conducted |
| Bean Origin | Where cocoa beans came from |
| Cocoa | Percentage of cocoa content |
| Rating | Expert score (1.0 - 5.0) |
| Category | Rating classification |
| Flavors | Tasting notes from experts |


References {data-icon="fa-book"}
=====================================

Column {data-width=500}
-----------------------------------------------------------------------

### 📚 Data Sources & Methodology

<div class="info-card">

**Primary Data Source:**

📊 **TidyTuesday - Chocolate Bar Ratings (2022-01-18)**

| Attribute | Details |
|:----------|:--------|
| Repository | [github.com/rfordatascience/tidytuesday](https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-18) |
| Direct CSV | [chocolate.csv](https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv) |
| Records | 2,530 chocolate bar reviews |
| Time Span | 2006 - 2021 |
| Variables | 10 original columns |

**Original Data Provider:**

🍫 **Flavors of Cacao** - [flavorsofcacao.com](http://flavorsofcacao.com/chocolate_database.html)

- Compiled by: **Manhattan Chocolate Society**
- Rating methodology: Blind tasting by certified chocolate experts
- Scale: 1.0 (unpleasant) to 5.0 (elite/outstanding)

**Data Collection Method:**

Expert tasters evaluate chocolate bars on texture, flavor complexity, finish, and overall impression. Each bar is rated independently without brand knowledge.

</div>

### 📦 R Packages Used

```{r}
packages_df <- tibble(
  Package = c("flexdashboard", "tidyverse", "ggplot2", "plotly", "DT", "knitr", "kableExtra", "scales"),
  Version = c(
    as.character(packageVersion("flexdashboard")),
    as.character(packageVersion("tidyverse")),
    as.character(packageVersion("ggplot2")),
    as.character(packageVersion("plotly")),
    as.character(packageVersion("DT")),
    as.character(packageVersion("knitr")),
    as.character(packageVersion("kableExtra")),
    as.character(packageVersion("scales"))
  ),
  Purpose = c(
    "Dashboard framework & layout",
    "Data wrangling (dplyr, tidyr, readr)",
    "Grammar of graphics visualizations",
    "Interactive charts with WebGL",
    "Interactive searchable data tables",
    "Dynamic report generation",
    "Advanced table formatting",
    "Axis & label formatting"
  ),
  Citation = c(
    "Iannone et al. (2024)",
    "Wickham et al. (2019)",
    "Wickham (2016)",
    "Sievert (2020)",
    "Xie et al. (2024)",
    "Xie (2024)",
    "Zhu (2024)",
    "Wickham & Seidel (2022)"
  )
)

kable(packages_df, align = c('l', 'c', 'l', 'l')) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = TRUE, font_size = 11) %>%
  row_spec(0, bold = TRUE, background = "#2C1810", color = "white")
```

Column {data-width=500}
-----------------------------------------------------------------------

### 🔗 Documentation & Tutorials

<div class="info-card">

**Official Documentation:**

| Resource | URL | Purpose |
|:---------|:----|:--------|
| 📖 Flexdashboard | [pkgs.rstudio.com/flexdashboard](https://pkgs.rstudio.com/flexdashboard/) | Dashboard layouts & components |
| 📊 Plotly R | [plotly.com/r](https://plotly.com/r/) | Interactive visualizations |
| 🎨 ggplot2 | [ggplot2.tidyverse.org](https://ggplot2.tidyverse.org/) | Static graphics reference |
| 📋 DT Package | [rstudio.github.io/DT](https://rstudio.github.io/DT/) | DataTables integration |
| 📚 kableExtra | [haozhu233.github.io/kableExtra](https://haozhu233.github.io/kableExtra/) | Table styling |

**Books & Learning Resources:**

1. Wickham, H. & Grolemund, G. (2023). *R for Data Science* (2nd ed.). [r4ds.hadley.nz](https://r4ds.hadley.nz/)

2. Sievert, C. (2020). *Interactive Web-Based Data Visualization with R, plotly, and shiny*. Chapman & Hall/CRC. [plotly-r.com](https://plotly-r.com/)

3. Wilke, C. O. (2019). *Fundamentals of Data Visualization*. O'Reilly. [clauswilke.com/dataviz](https://clauswilke.com/dataviz/)

4. Healy, K. (2018). *Data Visualization: A Practical Introduction*. Princeton University Press.

**Course Materials:**

- MIS029 Data Visualization lecture notes
- Flexdashboard layout examples: [pkgs.rstudio.com/flexdashboard/articles/layouts.html](https://pkgs.rstudio.com/flexdashboard/articles/layouts.html)

</div>

### 🔄 Session Information

```{r}
session_df <- tibble(
  Property = c("R Version", "Platform", "Operating System", "Locale", "Date Generated", "Timezone"),
  Value = c(
    paste(R.version$major, R.version$minor, sep = "."),
    R.version$platform,
    paste(Sys.info()["sysname"], Sys.info()["release"]),
    Sys.getlocale("LC_TIME"),
    format(Sys.time(), "%Y-%m-%d %H:%M:%S"),
    Sys.timezone()
  )
)

kable(session_df, align = c('l', 'l')) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE, font_size = 11) %>%
  row_spec(0, bold = TRUE, background = "#455A64", color = "white")
```

### 📋 Reproducibility & License

<div class="info-card" style="border-left-color: #FF9800;">

**To reproduce this analysis:**

```r
# 1. Install required packages
install.packages(c("flexdashboard", "tidyverse", 
                   "plotly", "DT", "knitr", 
                   "kableExtra", "scales"))

# 2. Render the dashboard
rmarkdown::render("MertEfeKurt_2307071061_Final.Rmd")
```

⚠️ **Requirements:** 

- R version ≥ 4.0.0
- Internet connection (data fetched from GitHub)
- ~500 MB RAM for rendering

**Data License:** TidyTuesday data is released under CC0 1.0 Universal license.

---

**Dashboard Author:** Mert Efe Kurt (2307071061)

**Course:** MIS029 - Data Visualization

**Institution:** Final Project Submission

**Generated:** `r format(Sys.time(), "%B %d, %Y at %H:%M")`

</div>