DSA210 term project
Entertainment Behavior Under Academic Pressure
Hello, my name is Nihat Ömer Karaca. I am a CS & IE double-major student studying at Sabancı University.
For our DSA210 course project, we were supposed to select a dataset and implement what the course required. I started by thinking about my daily life and what I actually do during ordinary days, midterms, and finals.
After brainstorming, I realized that I was spending a lot of time on some platforms, and that this might be affected by academic requirements such as finals and midterms. I also thought that the platforms I use, such as Spotify, streaming platforms, YouTube, and the academic calendar, might affect each other. Maybe they pave the way for each other, or maybe they create a blockade.
Because of that, I downloaded and considered several datasets: ChatGPT, Instagram, Netflix, Spotify, Twitter, and YouTube. After careful consideration, I decided to exclude Instagram and Twitter because they would have increased the complexity of the project too much. I also postponed ChatGPT because it is a completely different domain and would need a heavier privacy and parsing process. So I stuck with YouTube, Spotify, Netflix, Prime Video, and the academic calendar.
I also prefer looking at project content on a webpage rather than only inside notebook files. That is why I created this page as a public-facing explanation of the project. Webpage explanation will be filled later (TODO).
Raw data and public outputs
Different exports, one daily frame.
The raw exports were not equally clean: some arrived as structured CSV/JSON, while others needed parsing and privacy decisions. The public version keeps only the fields needed for analysis and uses the day as the shared unit.
YouTube
34,317 rows2022-04-17 to 2026-03-15
Google Takeout activity converted into a public activity table. Channel names are masked.Spotify
178,202 rows2019-07-27 to 2026-03-14
Music-focused streaming history with exact timestamps and listening duration.Netflix
2,493 rows2020-02-25 to 2026-03-10
Viewing history kept at row level. Titles stay visible; local source paths are not exposed.Prime Video
719 rows2022-03-30 to 2026-03-19
Watch history parsed into movie and episode records where possible.YouTube
Raw shape: Google Takeout activity export with watch/search-style events and timestamps.
Kept: Action type, target kind, masked channel references, timestamps, and daily activity fields.
Excluded: Real channel names and direct source identifiers were removed or masked.
Spotify
Raw shape: Structured streaming-history JSON with exact timestamps and listening duration.
Kept: Track, artist, album, timestamp, milliseconds played, minutes played, and hours played.
Excluded: Account identifiers, IP-like fields, and raw source-file identifiers were removed.
Netflix
Raw shape: Viewing-history CSV with title and date-level viewing records.
Kept: Original title string and viewing date at row level.
Excluded: No forced show/season/episode split because some Netflix titles are too inconsistent to parse safely.
Prime Video
Raw shape: Watch-history export parsed into movie and episode records.
Kept: Title, record type, series/movie fields where available, and date-level watch records.
Excluded: Raw parse issues and local source details stay out of the public browser app.
Processing scripts
From export to public analysis.
Each platform follows the same logic: normalize the raw export, create shared `fine_*` fields, reduce the public schema, then aggregate later for EDA and testing.
Interactive EDA
Interactive exploratory analysis.
These charts are SVG-based and interactive. Hover bars, points, and cells to see values and highlight the element being inspected.
YouTube EDA
YouTube activity overview.
YouTube is mostly watch activity. The estimated watch-time line is a conservative proxy: it sums gaps of up to 60 minutes between consecutive watched-video timestamps.
Uses YouTube action counts to show whether the dataset is mostly watch activity, search activity, or other actions.
Uses monthly watched and search counts to show how YouTube activity changes over time.
Uses gaps between consecutive watched-video timestamps to estimate rough monthly watch time.
Uses YouTube timestamps converted to Istanbul time to show which hours have more YouTube activity.
Uses daily watched counts grouped by weekday to inspect whether YouTube use changes by day of week.
Spotify EDA
Spotify listening overview.
The main EDA checks monthly listening, time of day, and artist concentration without using it as a final hypothesis directly.
Uses Spotify monthly listening hours and stream counts to show long-term listening patterns.
Uses Spotify timestamps and hours played to show which hours have more listening activity.
Uses total Spotify hours per artist to show the most repeated artists.
Uses total listening minutes per track to show the most repeated individual songs.
Uses daily Spotify hours grouped by weekday to inspect weekday differences.
Netflix + Prime Video EDA
Long-form streaming overview.
Netflix and Prime stay separate public datasets, but the EDA groups them as long-form streaming. Because most days have zero long-form activity, active-day charts are clearer than standard boxplots for this part.
Manual data check: in the common 1,424-day window, 1,160 days have zero Netflix + Prime records and only 264 days are active. Since at least 75% of each weekday group is zero, the standard boxplot has Q1 = median = Q3 = 0. The chart below excludes zero days so it shows which weekdays had actual long-form use. Top-title charts are also shown as EDA summaries; Netflix uses conservative title groups only for display, while the public file still keeps the original title text.
Uses Netflix and Prime Video row counts to compare their total contribution.
Uses monthly Netflix, Prime Video, and combined counts to show long-form viewing over time.
Uses Prime Video record type to show whether the records are mostly episodes or movies.
Uses Netflix title quality checks to show usable titles versus missing or malformed title rows.
Uses grouped Netflix titles to show the most repeated Netflix series or title groups.
Uses Prime Video series and movie titles to show the most repeated Prime Video titles.
Uses active Netflix + Prime days only, excluding zero days, to show which weekdays had actual long-form use.
Combined EDA
Combined daily panel.
These views support the hypothesis stage without mixing incompatible raw units.
Uses daily Spotify hours and YouTube watched count to inspect same-day co-usage.
Uses daily Netflix + Prime count and YouTube watched count to inspect whether long-form activity relates to YouTube activity.
Uses daily platform variables to show which variables move together more strongly.
Uses YouTube watched count grouped by academic period to compare ordinary days, finals, and other periods.
Uses Spotify daily hours grouped by academic period to compare listening across periods.
Uses active long-form days grouped by academic period, excluding zero days for readability.
Uses the number of active platforms per day to show whether platform variety changes by period.
Uses YouTube activity after 21:30 to inspect late-evening behavior across academic periods.
Uses Spotify listening after 21:30 to inspect late-evening listening across academic periods.
Combined EDA overview
Cross-platform checks.
They summarize coverage, active days, monthly movement, distributions, academic-period movement, and hourly activity before the formal hypothesis-testing section.
Uses each platform's date range to show the common window where all datasets overlap.
Uses daily activity flags to show how many days each platform appears in the common window.
Uses monthly YouTube watched count, Spotify hours, and Netflix + Prime count as separate trend panels.
Uses daily variables to show skew, zero-heavy behavior, and outliers.
Uses period averages divided by overall averages to compare platforms without mixing raw units.
Uses YouTube and Spotify timestamps to compare time-of-day behavior in Istanbul time.
Hypothesis testing
Hypothesis tests and current results.
The formal tests use daily variables from the same public-data panel. Final-exam comparisons use one-sided Mann-Whitney U tests; same-day Spotify/YouTube co-usage uses one-sided Spearman correlation.
Hypothesis table
| Hypothesis | Null hypothesis (H0) | Alternative hypothesis (H1) | Method |
|---|---|---|---|
| H1 Entertainment usage during academic pressure | Platform activity does not differ between final-exam days and ordinary-term days. | Final-exam days have lower platform activity. | One-sided Mann-Whitney U |
| H2 Platform diversity during academic pressure | Platform diversity does not differ between final-exam days and ordinary-term days. | Platform diversity is lower during finals. | One-sided Mann-Whitney U |
| H3 Netflix + Prime and YouTube activity | YouTube watched count does not differ between Netflix + Prime active and inactive days. | YouTube watched count is lower on Netflix + Prime active days. | One-sided Mann-Whitney U |
| H4 Spotify and YouTube co-usage | Spotify hours are not associated with YouTube watched count. | Spotify hours are positively associated with YouTube watched count. | One-sided Spearman correlation |
| H5 After-9:30 PM entertainment during finals | Late-evening entertainment share does not differ between final-exam days and ordinary-term days. | Late-evening share is lower during finals. | One-sided Mann-Whitney U |
Entertainment usage during academic pressure
H0: Platform activity does not differ between final-exam days and ordinary-term days.
H1: Final-exam days have lower platform activity.
Method: One-sided Mann-Whitney U
Platform diversity during academic pressure
H0: Platform diversity does not differ between final-exam days and ordinary-term days.
H1: Platform diversity is lower during finals.
Method: One-sided Mann-Whitney U
Netflix + Prime and YouTube activity
H0: YouTube watched count does not differ between Netflix + Prime active and inactive days.
H1: YouTube watched count is lower on Netflix + Prime active days.
Method: One-sided Mann-Whitney U
Spotify and YouTube co-usage
H0: Spotify hours are not associated with YouTube watched count.
H1: Spotify hours are positively associated with YouTube watched count.
Method: One-sided Spearman correlation
After-9:30 PM entertainment during finals
H0: Late-evening entertainment share does not differ between final-exam days and ordinary-term days.
H1: Late-evening share is lower during finals.
Method: One-sided Mann-Whitney U
Entertainment usage during academic pressure
Null hypothesis (H0): Platform activity does not differ between final-exam days and ordinary-term days.
Alternative hypothesis (H1): Final-exam days have lower platform activity.
H1: X_final < X_ordinaryOne-sided Mann-Whitney UPlatform diversity during academic pressure
Null hypothesis (H0): Platform diversity does not differ between final-exam days and ordinary-term days.
Alternative hypothesis (H1): Platform diversity is lower during finals.
D = count(active platforms per day)One-sided Mann-Whitney UNetflix + Prime and YouTube activity
Null hypothesis (H0): YouTube watched count does not differ between Netflix + Prime active and inactive days.
Alternative hypothesis (H1): YouTube watched count is lower on Netflix + Prime active days.
YouTube | long-form active < YouTube | inactiveOne-sided Mann-Whitney USpotify and YouTube co-usage
Null hypothesis (H0): Spotify hours are not associated with YouTube watched count.
Alternative hypothesis (H1): Spotify hours are positively associated with YouTube watched count.
rho_s = corr(rank(Spotify), rank(YouTube))One-sided Spearman correlationAfter-9:30 PM entertainment during finals
Null hypothesis (H0): Late-evening entertainment share does not differ between final-exam days and ordinary-term days.
Alternative hypothesis (H1): Late-evening share is lower during finals.
share_after_2130 = after_2130 / daily_totalOne-sided Mann-Whitney UComplete result table
| Hypothesis | Outcome | Test | Statistic | p-value | Means | Decision |
|---|---|---|---|---|---|---|
| H1 Entertainment usage during academic pressure | YouTube watched count | Mann-Whitney U | 34,501 | 0.0908 | 24.4 / 30.8 | Do not reject H0 |
| H1 Entertainment usage during academic pressure | Spotify listening hours | Mann-Whitney U | 34,130 | 0.0697 | 1.6 / 1.9 | Do not reject H0 |
| H1 Entertainment usage during academic pressure | Netflix viewing count | Mann-Whitney U | 38,928.5 | 0.8512 | 1 / 0.8994 | Do not reject H0 |
| H1 Entertainment usage during academic pressure | Prime Video viewing count | Mann-Whitney U | 37,300 | 0.3890 | 0.2165 / 0.5587 | Do not reject H0 |
| H2 Platform diversity during academic pressure | Distinct active entertainment platforms | Mann-Whitney U | 35,360 | 0.1305 | 1.8 / 1.9 | Do not reject H0 |
| H3 Long-form streaming and YouTube activity | YouTube watched count | Mann-Whitney U | 159,759.5 | 0.8731 | 26.4 / 21.4 | Do not reject H0 |
| H4 Spotify and YouTube co-usage | Spotify hours and YouTube watched count | Spearman correlation | 0.0934 | < 0.001 | 1.9 / 22.3 | Reject H0 |
| H5 After-9:30 PM entertainment during finals | YouTube after-21:30 activity share | Mann-Whitney U | 32,124 | 0.0059 | 0.1786 / 0.2732 | Reject H0 |
| H5 After-9:30 PM entertainment during finals | Spotify after-21:30 listening-hour share | Mann-Whitney U | 33,480 | 0.0361 | 0.1764 / 0.2168 | Reject H0 |
YouTube watched count
H1 Entertainment usage during academic pressure
- Test
- Mann-Whitney U
- p-value
- 0.0908
- Statistic
- 34,501
- Means
- 24.4 / 30.8
Spotify listening hours
H1 Entertainment usage during academic pressure
- Test
- Mann-Whitney U
- p-value
- 0.0697
- Statistic
- 34,130
- Means
- 1.6 / 1.9
Netflix viewing count
H1 Entertainment usage during academic pressure
- Test
- Mann-Whitney U
- p-value
- 0.8512
- Statistic
- 38,928.5
- Means
- 1 / 0.8994
Prime Video viewing count
H1 Entertainment usage during academic pressure
- Test
- Mann-Whitney U
- p-value
- 0.3890
- Statistic
- 37,300
- Means
- 0.2165 / 0.5587
Distinct active entertainment platforms
H2 Platform diversity during academic pressure
- Test
- Mann-Whitney U
- p-value
- 0.1305
- Statistic
- 35,360
- Means
- 1.8 / 1.9
YouTube watched count
H3 Long-form streaming and YouTube activity
- Test
- Mann-Whitney U
- p-value
- 0.8731
- Statistic
- 159,759.5
- Means
- 26.4 / 21.4
Spotify hours and YouTube watched count
H4 Spotify and YouTube co-usage
- Test
- Spearman correlation
- p-value
- < 0.001
- Statistic
- 0.0934
- Means
- 1.9 / 22.3
YouTube after-21:30 activity share
H5 After-9:30 PM entertainment during finals
- Test
- Mann-Whitney U
- p-value
- 0.0059
- Statistic
- 32,124
- Means
- 0.1786 / 0.2732
Spotify after-21:30 listening-hour share
H5 After-9:30 PM entertainment during finals
- Test
- Mann-Whitney U
- p-value
- 0.0361
- Statistic
- 33,480
- Means
- 0.1764 / 0.2168
Visual result summary
Result card summary
YouTube watched count
Do not reject H0Current basic result does not provide strong evidence that this platform is lower during finals.
Spotify listening hours
Do not reject H0Current basic result does not provide strong evidence that this platform is lower during finals.
Netflix viewing count
Do not reject H0Current basic result does not provide strong evidence that this platform is lower during finals.
Prime Video viewing count
Do not reject H0Current basic result does not provide strong evidence that this platform is lower during finals.
Distinct active entertainment platforms
Do not reject H0Platform diversity is slightly lower in finals, but the current basic test does not reject H0.
YouTube watched count
Do not reject H0The current basic test does not support lower YouTube activity on Netflix+Prime active days.
Spotify hours and YouTube watched count
Reject H0The association is statistically detectable in this basic test, but the effect is weak.
YouTube after-21:30 activity share
Reject H0The YouTube late-evening share is lower during finals in the current basic test.
Spotify after-21:30 listening-hour share
Reject H0The Spotify late-evening listening share is lower during finals in the current basic test.
Appendix
Notebook EDA image gallery.
The main story above uses interactive SVG charts. The appendix keeps the original notebook PNGs for traceability and presentation reuse, with filters by EDA group.
Back to interactive EDA