DSA210 term project

Entertainment Behavior Under Academic Pressure

Hello, my name is Nihat Ömer Karaca. I am a CS & IE double-major student studying at Sabancı University.

For our DSA210 course project, we were supposed to select a dataset and implement what the course required. I started by thinking about my daily life and what I actually do during ordinary days, midterms, and finals.

After brainstorming, I realized that I was spending a lot of time on some platforms, and that this might be affected by academic requirements such as finals and midterms. I also thought that the platforms I use, such as Spotify, streaming platforms, YouTube, and the academic calendar, might affect each other. Maybe they pave the way for each other, or maybe they create a blockade.

Because of that, I downloaded and considered several datasets: ChatGPT, Instagram, Netflix, Spotify, Twitter, and YouTube. After careful consideration, I decided to exclude Instagram and Twitter because they would have increased the complexity of the project too much. I also postponed ChatGPT because it is a completely different domain and would need a heavier privacy and parsing process. So I stuck with YouTube, Spotify, Netflix, Prime Video, and the academic calendar.

I also prefer looking at project content on a webpage rather than only inside notebook files. That is why I created this page as a public-facing explanation of the project. Webpage explanation will be filled later (TODO).

Raw and public datasetsProcessing scriptsIndividual EDACombined EDAHypothesesCurrent test resultsNotebook plot appendix
1,424daily rows in the shared analysis window
4public behavior datasets
50EDA appendix figures
9basic result rows

Raw data and public outputs

Different exports, one daily frame.

The raw exports were not equally clean: some arrived as structured CSV/JSON, while others needed parsing and privacy decisions. The public version keeps only the fields needed for analysis and uses the day as the shared unit.

Short-form and mixed video behavior

YouTube

34,317 rows

2022-04-17 to 2026-03-15

Google Takeout activity converted into a public activity table. Channel names are masked.
Music listening duration

Spotify

178,202 rows

2019-07-27 to 2026-03-14

Music-focused streaming history with exact timestamps and listening duration.
Long-form viewing

Netflix

2,493 rows

2020-02-25 to 2026-03-10

Viewing history kept at row level. Titles stay visible; local source paths are not exposed.
Long-form viewing

Prime Video

719 rows

2022-03-30 to 2026-03-19

Watch history parsed into movie and episode records where possible.

YouTube

Raw shape: Google Takeout activity export with watch/search-style events and timestamps.

Kept: Action type, target kind, masked channel references, timestamps, and daily activity fields.

Excluded: Real channel names and direct source identifiers were removed or masked.

Spotify

Raw shape: Structured streaming-history JSON with exact timestamps and listening duration.

Kept: Track, artist, album, timestamp, milliseconds played, minutes played, and hours played.

Excluded: Account identifiers, IP-like fields, and raw source-file identifiers were removed.

Netflix

Raw shape: Viewing-history CSV with title and date-level viewing records.

Kept: Original title string and viewing date at row level.

Excluded: No forced show/season/episode split because some Netflix titles are too inconsistent to parse safely.

Prime Video

Raw shape: Watch-history export parsed into movie and episode records.

Kept: Title, record type, series/movie fields where available, and date-level watch records.

Excluded: Raw parse issues and local source details stay out of the public browser app.

Processing scripts

From export to public analysis.

Each platform follows the same logic: normalize the raw export, create shared `fine_*` fields, reduce the public schema, then aggregate later for EDA and testing.

1. Platform scripts normalize raw exports into processed row-level tables.2. fine_* builders create shared date, platform, source, and record-id fields.3. Public builders reduce the schema and mask source/account identifiers.4. EDA notebooks aggregate public rows to daily and monthly variables.5. The web app uses compact aggregate JSON and copied plot images, not full raw CSVs.
Processing pipeline
Animated processing pipeline from exports to testsRaw exports

Platform files with different structures

Processing

Normalize dates, mask identifiers

Public tables

Keep useful behavior fields only

Daily EDA

Aggregate by date and academic period

Tests

Run simple nonparametric checks

Interactive EDA

Interactive exploratory analysis.

These charts are SVG-based and interactive. Hover bars, points, and cells to see values and highlight the element being inspected.

YouTube EDA

YouTube activity overview.

YouTube is mostly watch activity. The estimated watch-time line is a conservative proxy: it sums gaps of up to 60 minutes between consecutive watched-video timestamps.

YouTube activity by action

Uses YouTube action counts to show whether the dataset is mostly watch activity, search activity, or other actions.

YouTube activity by actionYouTube activity by actionWatched31,967Searched for2,126Viewed156Liked34Subscribed to16Used Shorts creation t...16Answered survey questi...2
Monthly YouTube watched and search counts

Uses monthly watched and search counts to show how YouTube activity changes over time.

Monthly YouTube watched and search countsMonthly YouTube watched and search counts010002000300040002023202420252026WatchedSearch
Estimated continuous YouTube watch time

Uses gaps between consecutive watched-video timestamps to estimate rough monthly watch time.

Estimated continuous YouTube watch timeEstimated continuous YouTube watch time0204060802023202420252026Estimated hours
YouTube activity by hour

Uses YouTube timestamps converted to Istanbul time to show which hours have more YouTube activity.

YouTube activity by hourYouTube activity by hour010002000300040000:004:008:0012:0016:0020:0023:00
YouTube watched count by weekday

Uses daily watched counts grouped by weekday to inspect whether YouTube use changes by day of week.

YouTube watched count by weekdayYouTube watched count by weekday050100150200250300350MondayTuesdayWednesdayThursdayFridaySaturdaySundayBox = Q1-Q3, line = median, whiskers = min/max.

Spotify EDA

Spotify listening overview.

The main EDA checks monthly listening, time of day, and artist concentration without using it as a final hypothesis directly.

Monthly Spotify listening

Uses Spotify monthly listening hours and stream counts to show long-term listening patterns.

Monthly Spotify listeningMonthly Spotify listening01000200030004000500060002020202120222023202420252026HoursStreams
Spotify listening by hour

Uses Spotify timestamps and hours played to show which hours have more listening activity.

Spotify listening by hourSpotify listening by hour0501001502002500:004:008:0012:0016:0020:0023:00
Top Spotify artists by listening hours

Uses total Spotify hours per artist to show the most repeated artists.

Top Spotify artists by listening hoursTop Spotify artists by listening hoursGöksel31.8Dua Lipa35.8The Cranberries36.2Imagine Dragons37.9Murat Dalkılıç38.7Gwen Stefani45.6mor ve ötesi67.1Eminem68.3Robbie Williams71.8Taylor Swift104
Top Spotify tracks by listening minutes

Uses total listening minutes per track to show the most repeated individual songs.

Top Spotify tracks by listening minutesTop Spotify tracks by listening minutesDon't You (Forget Abou...524Locked Away (feat. Ada...547Dreams - The Cranberri...572Animal Instinct - The ...605This I Love - Guns N' ...608
Spotify listening hours by weekday

Uses daily Spotify hours grouped by weekday to inspect weekday differences.

Spotify listening hours by weekdaySpotify listening hours by weekday0246810MondayTuesdayWednesdayThursdayFridaySaturdaySundayBox = Q1-Q3, line = median, whiskers = min/max.

Netflix + Prime Video EDA

Long-form streaming overview.

Netflix and Prime stay separate public datasets, but the EDA groups them as long-form streaming. Because most days have zero long-form activity, active-day charts are clearer than standard boxplots for this part.

Manual data check: in the common 1,424-day window, 1,160 days have zero Netflix + Prime records and only 264 days are active. Since at least 75% of each weekday group is zero, the standard boxplot has Q1 = median = Q3 = 0. The chart below excludes zero days so it shows which weekdays had actual long-form use. Top-title charts are also shown as EDA summaries; Netflix uses conservative title groups only for display, while the public file still keeps the original title text.

Long-form total count by platform

Uses Netflix and Prime Video row counts to compare their total contribution.

Long-form total count by platformLong-form total count by platform050010001500NetflixPrime Video
Monthly long-form streaming

Uses monthly Netflix, Prime Video, and combined counts to show long-form viewing over time.

Monthly long-form streamingMonthly long-form streaming0501001502002023202420252026NetflixPrime VideoNetflix + Prime
Prime Video record type split

Uses Prime Video record type to show whether the records are mostly episodes or movies.

Prime Video record type splitPrime Video record type split0100200300400500600700episodemovie
Netflix title quality

Uses Netflix title quality checks to show usable titles versus missing or malformed title rows.

Netflix title qualityNetflix title quality05001000150020002500usable titlemissing or m...
Top Netflix title groups

Uses grouped Netflix titles to show the most repeated Netflix series or title groups.

Top Netflix title groupsTop Netflix title groupsSuits97The Seven Deadly Sins105Brooklyn Nine-Nine127Friends197How I Met Your Mother208
Top Prime Video titles

Uses Prime Video series and movie titles to show the most repeated Prime Video titles.

Top Prime Video titlesTop Prime Video titlesEŞREF RÜYA29Zaman Çarkı58Two and a Half Men73Game of Thrones78Supernatural218
Long-form streaming by weekday

Uses active Netflix + Prime days only, excluding zero days, to show which weekdays had actual long-form use.

Netflix + Prime active days by weekdayNetflix + Prime active days by weekdayZero days excluded. Bars = active days; avg = records per active day.0102030405035Mondayavg 8.8646Tuesdayavg 6.5034Wednesdayavg 8.6532Thursdayavg 6.0338Fridayavg 8.1341Saturdayavg 7.6138Sundayavg 9.47active days

Combined EDA

Combined daily panel.

These views support the hypothesis stage without mixing incompatible raw units.

Spotify hours vs YouTube watched

Uses daily Spotify hours and YouTube watched count to inspect same-day co-usage.

YouTube watched by Spotify hours0246810050100150200250300350Spotify hoursYouTube watched
Netflix + Prime count vs YouTube watched

Uses daily Netflix + Prime count and YouTube watched count to inspect whether long-form activity relates to YouTube activity.

YouTube watched by Netflix + Prime count010203040050100150200250300350Netflix + Prime countYouTube watched
Daily activity correlation heatmap

Uses daily platform variables to show which variables move together more strongly.

Interactive Spearman correlation heatmapYT watchYT watchYouTube searchYouTube searchSpotify hrsSpotify hrsSpotify streamsSpotify streamsN+PrimeN+PrimePlatformsPlatformsYT 21:30YT 21:30Spotify 21:30Spotify 21:301.000.740.09-0.000.040.700.760.000.741.000.140.050.050.580.630.000.090.141.000.85-0.020.180.080.42-0.000.050.851.000.010.120.020.430.040.05-0.020.011.000.560.03-0.100.700.580.180.120.561.000.510.030.760.630.080.020.030.511.000.040.000.000.420.43-0.100.030.041.00
Platform activity by academic period

Uses YouTube watched count grouped by academic period to compare ordinary days, finals, and other periods.

YouTube watched by academic periodYouTube watched by academic period050100150200250300350Ordinary termFinal examOutside calend...Summer workBox = Q1-Q3, line = median, whiskers = min/max.
Spotify activity by academic period

Uses Spotify daily hours grouped by academic period to compare listening across periods.

Spotify hours by academic periodSpotify hours by academic period0246810Ordinary termFinal examOutside calend...Summer workBox = Q1-Q3, line = median, whiskers = min/max.
Long-form streaming by academic period

Uses active long-form days grouped by academic period, excluding zero days for readability.

Netflix + Prime active days by academic periodNetflix + Prime active days by academic periodZero days excluded. Bars = active days; avg = records per active day.020406080100120140134Ordinary termavg 8.4320Final examavg 674Outside calendaravg 8.3236Summer workavg 5.86active days
Platform diversity by academic period

Uses the number of active platforms per day to show whether platform variety changes by period.

Platform diversity by academic periodPlatform diversity by academic period01234Ordinary termFinal examOutside calend...Summer workBox = Q1-Q3, line = median, whiskers = min/max.
YouTube after-21:30 by academic period

Uses YouTube activity after 21:30 to inspect late-evening behavior across academic periods.

YouTube after-21:30 by academic periodYouTube after-21:30 by academic period050100150200250Ordinary termFinal examOutside calend...Summer workBox = Q1-Q3, line = median, whiskers = min/max.
Spotify after-21:30 by academic period

Uses Spotify listening after 21:30 to inspect late-evening listening across academic periods.

Spotify after-21:30 by academic periodSpotify after-21:30 by academic period01234Ordinary termFinal examOutside calend...Summer workBox = Q1-Q3, line = median, whiskers = min/max.

Combined EDA overview

Cross-platform checks.

They summarize coverage, active days, monthly movement, distributions, academic-period movement, and hourly activity before the formal hypothesis-testing section.

Dataset coverage timeline

Uses each platform's date range to show the common window where all datasets overlap.

Public dataset date coverage2020202120222023202420252026YouTubeSpotifyNetflixPrime Videocommon testing range: 2022-04-17 to 2026-03-10
Active days by platform

Uses daily activity flags to show how many days each platform appears in the common window.

Active days by platform05001000850YouTube1365Spotify185Netflix90Prime Video
Monthly cross-platform trends

Uses monthly YouTube watched count, Spotify hours, and Netflix + Prime count as separate trend panels.

Monthly cross-platform trends as small multiples020004000YouTube watchedmonthly recordsYouTube watched, monthly records050100Spotify listeningmonthly hoursSpotify listening, monthly hours0100200Netflix + Primemonthly viewsNetflix + Prime, monthly views2023202420252026Each panel has its own y-axis.
Daily distribution overview

Uses daily variables to show skew, zero-heavy behavior, and outliers.

Daily variable distributionsYouTube watched01002003000Spotify hours02468100Netflix + Prime0102030400Platform diversity012340
Relative activity by academic period

Uses period averages divided by overall averages to compare platforms without mixing raw units.

Relative activity by analysis periodoverall averageordinary term YT watch: 1.38 relative to averageordinary term Spotify hrs: 1.01 relative to averageordinary term N+Prime: 1.00 relative to averageordinary term Platforms: 1.07 relative to averageOrdinaryfinal exam YT watch: 1.09 relative to averagefinal exam Spotify hrs: 0.87 relative to averagefinal exam N+Prime: 0.85 relative to averagefinal exam Platforms: 1.03 relative to averageFinalsoutside calendar YT watch: 0.59 relative to averageoutside calendar Spotify hrs: 0.94 relative to averageoutside calendar N+Prime: 1.21 relative to averageoutside calendar Platforms: 0.93 relative to averageOutsidesummer work period YT watch: 0.21 relative to averagesummer work period Spotify hrs: 1.12 relative to averagesummer work period N+Prime: 0.71 relative to averagesummer work period Platforms: 0.85 relative to averageSummer workYT watchSpotify hrsN+PrimePlatforms
Hourly activity in Istanbul time

Uses YouTube and Spotify timestamps to compare time-of-day behavior in Istanbul time.

Hourly YouTube and Spotify activity in Istanbul timeYouTube watched records020004000Spotify listening hours01002000:006:0012:0018:0023:00Istanbul time. Hover bars to see values.

Hypothesis testing

Hypothesis tests and current results.

The formal tests use daily variables from the same public-data panel. Final-exam comparisons use one-sided Mann-Whitney U tests; same-day Spotify/YouTube co-usage uses one-sided Spearman correlation.

Hypothesis table

HypothesisNull hypothesis (H0)Alternative hypothesis (H1)Method
H1
Entertainment usage during academic pressure
Platform activity does not differ between final-exam days and ordinary-term days.Final-exam days have lower platform activity.One-sided Mann-Whitney U
H2
Platform diversity during academic pressure
Platform diversity does not differ between final-exam days and ordinary-term days.Platform diversity is lower during finals.One-sided Mann-Whitney U
H3
Netflix + Prime and YouTube activity
YouTube watched count does not differ between Netflix + Prime active and inactive days.YouTube watched count is lower on Netflix + Prime active days.One-sided Mann-Whitney U
H4
Spotify and YouTube co-usage
Spotify hours are not associated with YouTube watched count.Spotify hours are positively associated with YouTube watched count.One-sided Spearman correlation
H5
After-9:30 PM entertainment during finals
Late-evening entertainment share does not differ between final-exam days and ordinary-term days.Late-evening share is lower during finals.One-sided Mann-Whitney U
H1

Entertainment usage during academic pressure

H0: Platform activity does not differ between final-exam days and ordinary-term days.

H1: Final-exam days have lower platform activity.

Method: One-sided Mann-Whitney U

H2

Platform diversity during academic pressure

H0: Platform diversity does not differ between final-exam days and ordinary-term days.

H1: Platform diversity is lower during finals.

Method: One-sided Mann-Whitney U

H3

Netflix + Prime and YouTube activity

H0: YouTube watched count does not differ between Netflix + Prime active and inactive days.

H1: YouTube watched count is lower on Netflix + Prime active days.

Method: One-sided Mann-Whitney U

H4

Spotify and YouTube co-usage

H0: Spotify hours are not associated with YouTube watched count.

H1: Spotify hours are positively associated with YouTube watched count.

Method: One-sided Spearman correlation

H5

After-9:30 PM entertainment during finals

H0: Late-evening entertainment share does not differ between final-exam days and ordinary-term days.

H1: Late-evening share is lower during finals.

Method: One-sided Mann-Whitney U

H1

Entertainment usage during academic pressure

Null hypothesis (H0): Platform activity does not differ between final-exam days and ordinary-term days.

Alternative hypothesis (H1): Final-exam days have lower platform activity.

H1: X_final < X_ordinaryOne-sided Mann-Whitney U
H2

Platform diversity during academic pressure

Null hypothesis (H0): Platform diversity does not differ between final-exam days and ordinary-term days.

Alternative hypothesis (H1): Platform diversity is lower during finals.

D = count(active platforms per day)One-sided Mann-Whitney U
H3

Netflix + Prime and YouTube activity

Null hypothesis (H0): YouTube watched count does not differ between Netflix + Prime active and inactive days.

Alternative hypothesis (H1): YouTube watched count is lower on Netflix + Prime active days.

YouTube | long-form active < YouTube | inactiveOne-sided Mann-Whitney U
H4

Spotify and YouTube co-usage

Null hypothesis (H0): Spotify hours are not associated with YouTube watched count.

Alternative hypothesis (H1): Spotify hours are positively associated with YouTube watched count.

rho_s = corr(rank(Spotify), rank(YouTube))One-sided Spearman correlation
H5

After-9:30 PM entertainment during finals

Null hypothesis (H0): Late-evening entertainment share does not differ between final-exam days and ordinary-term days.

Alternative hypothesis (H1): Late-evening share is lower during finals.

share_after_2130 = after_2130 / daily_totalOne-sided Mann-Whitney U

Complete result table

HypothesisOutcomeTestStatisticp-valueMeansDecision
H1 Entertainment usage during academic pressureYouTube watched countMann-Whitney U34,5010.090824.4 / 30.8Do not reject H0
H1 Entertainment usage during academic pressureSpotify listening hoursMann-Whitney U34,1300.06971.6 / 1.9Do not reject H0
H1 Entertainment usage during academic pressureNetflix viewing countMann-Whitney U38,928.50.85121 / 0.8994Do not reject H0
H1 Entertainment usage during academic pressurePrime Video viewing countMann-Whitney U37,3000.38900.2165 / 0.5587Do not reject H0
H2 Platform diversity during academic pressureDistinct active entertainment platformsMann-Whitney U35,3600.13051.8 / 1.9Do not reject H0
H3 Long-form streaming and YouTube activityYouTube watched countMann-Whitney U159,759.50.873126.4 / 21.4Do not reject H0
H4 Spotify and YouTube co-usageSpotify hours and YouTube watched countSpearman correlation0.0934< 0.0011.9 / 22.3Reject H0
H5 After-9:30 PM entertainment during finalsYouTube after-21:30 activity shareMann-Whitney U32,1240.00590.1786 / 0.2732Reject H0
H5 After-9:30 PM entertainment during finalsSpotify after-21:30 listening-hour shareMann-Whitney U33,4800.03610.1764 / 0.2168Reject H0
H1Do not reject H0

YouTube watched count

H1 Entertainment usage during academic pressure

Test
Mann-Whitney U
p-value
0.0908
Statistic
34,501
Means
24.4 / 30.8
H1Do not reject H0

Spotify listening hours

H1 Entertainment usage during academic pressure

Test
Mann-Whitney U
p-value
0.0697
Statistic
34,130
Means
1.6 / 1.9
H1Do not reject H0

Netflix viewing count

H1 Entertainment usage during academic pressure

Test
Mann-Whitney U
p-value
0.8512
Statistic
38,928.5
Means
1 / 0.8994
H1Do not reject H0

Prime Video viewing count

H1 Entertainment usage during academic pressure

Test
Mann-Whitney U
p-value
0.3890
Statistic
37,300
Means
0.2165 / 0.5587
H2Do not reject H0

Distinct active entertainment platforms

H2 Platform diversity during academic pressure

Test
Mann-Whitney U
p-value
0.1305
Statistic
35,360
Means
1.8 / 1.9
H3Do not reject H0

YouTube watched count

H3 Long-form streaming and YouTube activity

Test
Mann-Whitney U
p-value
0.8731
Statistic
159,759.5
Means
26.4 / 21.4
H4Reject H0

Spotify hours and YouTube watched count

H4 Spotify and YouTube co-usage

Test
Spearman correlation
p-value
< 0.001
Statistic
0.0934
Means
1.9 / 22.3
H5Reject H0

YouTube after-21:30 activity share

H5 After-9:30 PM entertainment during finals

Test
Mann-Whitney U
p-value
0.0059
Statistic
32,124
Means
0.1786 / 0.2732
H5Reject H0

Spotify after-21:30 listening-hour share

H5 After-9:30 PM entertainment during finals

Test
Mann-Whitney U
p-value
0.0361
Statistic
33,480
Means
0.1764 / 0.2168

Visual result summary

Result card summary

H1p = 0.0908

YouTube watched count

Do not reject H0

Current basic result does not provide strong evidence that this platform is lower during finals.

Mann-Whitney U
Numbersstatistic: 34501.0000p-value: 0.0908means: 24.351 / 30.764medians: 8.000 / 12.000n: 97 / 775
H1p = 0.0697

Spotify listening hours

Do not reject H0

Current basic result does not provide strong evidence that this platform is lower during finals.

Mann-Whitney U
Numbersstatistic: 34130.0000p-value: 0.0697means: 1.646 / 1.911medians: 1.419 / 1.642n: 97 / 775
H1p = 0.8512

Netflix viewing count

Do not reject H0

Current basic result does not provide strong evidence that this platform is lower during finals.

Mann-Whitney U
Numbersstatistic: 38928.5000p-value: 0.8512means: 1.021 / 0.899medians: 0.000 / 0.000n: 97 / 775
H1p = 0.3890

Prime Video viewing count

Do not reject H0

Current basic result does not provide strong evidence that this platform is lower during finals.

Mann-Whitney U
Numbersstatistic: 37300.0000p-value: 0.3890means: 0.216 / 0.559medians: 0.000 / 0.000n: 97 / 775
H2p = 0.1305

Distinct active entertainment platforms

Do not reject H0

Platform diversity is slightly lower in finals, but the current basic test does not reject H0.

Mann-Whitney U
Numbersstatistic: 35360.0000p-value: 0.1305means: 1.794 / 1.870medians: 2.000 / 2.000n: 97 / 775
H3p = 0.8731

YouTube watched count

Do not reject H0

The current basic test does not support lower YouTube activity on Netflix+Prime active days.

Mann-Whitney U
Numbersstatistic: 159759.5000p-value: 0.8731means: 26.413 / 21.367medians: 4.000 / 3.000n: 264 / 1160
H4p = < 0.001

Spotify hours and YouTube watched count

Reject H0

The association is statistically detectable in this basic test, but the effect is weak.

Spearman correlation
Numbersstatistic: 0.0934p-value: < 0.001means: 1.889 / 22.303medians: 1.566 / 3.000n: 1424 / 1424
H5p = 0.0059

YouTube after-21:30 activity share

Reject H0

The YouTube late-evening share is lower during finals in the current basic test.

Mann-Whitney U
Numbersstatistic: 32124.0000p-value: 0.0059means: 0.179 / 0.273medians: 0.000 / 0.000n: 97 / 775
H5p = 0.0361

Spotify after-21:30 listening-hour share

Reject H0

The Spotify late-evening listening share is lower during finals in the current basic test.

Mann-Whitney U
Numbersstatistic: 33480.0000p-value: 0.0361means: 0.176 / 0.217medians: 0.014 / 0.037n: 97 / 775

Appendix

Notebook EDA image gallery.

The main story above uses interactive SVG charts. The appendix keeps the original notebook PNGs for traceability and presentation reuse, with filters by EDA group.

Back to interactive EDA