# LunarCrush — Reddit Subreddit-Level Equity Cut
**Prepared for Brevan Howard · June 12, 2026**

You asked this morning whether the subreddit can be exposed in the delivery. This dataset is the answer, live the same day: every post carries its subreddit, its own sentiment score, engagement, and a source link.

## Files

| File | Grain | What it is |
|---|---|---|
| `subreddit_ticker_daily.parquet/.tsv` | ticker × subreddit × day | Sampled post counts, engagement, average 1–5 sentiment, bullish (≥4) and bearish (≤2) post counts per community per name per day |
| `reddit_posts_with_subreddit.parquet/.tsv` | post | Each post with subreddit, created_utc, sentiment_1to5, interactions_total, title, link, creator |
| `reddit_daily_fullcounts.parquet/.tsv` | ticker × day | Full-count Reddit-only daily series (posts active/created, interactions, unique contributors, 3-way sentiment) for normalization and backtests |
| `subreddit_ticker_rollup.parquet` | ticker × subreddit | 90-day summary per community per name |
| `insights.json` | — | Precomputed highlights rendered on the explorer page |

## Universe & window
20 tickers (NVDA, TSLA, AAPL, GME, AMD + MSFT, META, AMZN, GOOGL, PLTR, SMCI, MSTR, COIN, HOOD, INTC, UNH, HIMS, NKE, DIS, BA) plus the brand keyword "nike" (ticker=NIKE in the files) to demonstrate keyword universes, trailing 90 days, daily buckets.

## Methodology, honestly stated
- **Post panel = top-engagement sample.** Up to 100 highest-engagement posts per ticker per day. It captures where the conversation that matters happens; it is not the full post census.
- **Daily full-count series = complete.** Every post counted, no sampling. Use it as the denominator when normalizing the sampled panel.
- **Sentiment** is scored per post (1–5, 3 = neutral) by NLP models trained on financial language, then averaged within subreddit × ticker × day.
- **Collection scope:** posts where the tracked ticker/keyword is mentioned (post or comments). Engagement metrics cover the full thread.

## Production delivery
Hourly buckets, multi-year history (back to 2020), any ticker/keyword/custom-collection universe, posts + comments attribution, all networks (X, TikTok, YouTube, Instagram, news) with the same per-network isolation, via REST API or S3/Parquet drops. Subreddit attribution is being formalized into the versioned API spec; it is available in data deliveries today.

joe@lunarcrush.com · lunarcrush.com
