The Complete Research Handbook
For CS Students: Academic Research + Startup Product Research
From zero to publishing papers and building products people pay for Everything. A to Z. No fluff.
TABLE OF CONTENTS
- The Mental Foundation
- Types of Research
- Academic Research —
Full Process
- 3.1 Finding Your Research Question
- 3.2 Literature Review
- 3.3 Where to Find Papers
- 3.4 Reading Papers Efficiently
- 3.5 Organizing Your Research
- 3.6 Note-Taking Systems
- 3.7 Forming a Hypothesis
- 3.8 Designing Experiments
- 3.9 Running Experiments
- 3.10 Analyzing Results
- 3.11 Writing the Paper
- 3.12 Peer Review + Submission
- Startup/Product
Research — Full Process
- 4.1 Problem Discovery
- 4.2 Market Research
- 4.3 Customer Discovery
- 4.4 Competitive Analysis
- 4.5 Validation Before Building
- 4.6 Continuous Product Research
- 4.7 Metrics and Analytics
- Documentation Systems
- 5.1 Academic Documentation
- 5.2 Startup Documentation
- 5.3 Templates
- Tools Master List
- From Research to Startup
- Working With Advisors and Mentors
- Research Ethics
- The Weekly Rhythm
- Common Mistakes and How to Avoid Them
- Resources Library
- The Master Mental Model
1. THE MENTAL FOUNDATION
What Research Actually Is
Most people are taught research as: find papers → read papers → cite papers. That is library science. Not research.
Real research is: > You have a question the world hasn’t fully answered. You go find out. Then you tell the world what you found.
Think of the world as a giant codebase. Most of it is already written and documented — that is existing knowledge. But there are bugs (unsolved problems) and missing features (unexplored areas).
Research is finding those bugs and either: - Patching them — solving an existing problem → academic research - Building a feature people will pay for — solving a market problem → startup research
Both start the same way: you notice something is broken or missing.
The Core Difference: Academic vs Startup Research
| Dimension | Academic Research | Startup Research |
|---|---|---|
| Output | Knowledge / Paper | Product / Revenue |
| Audience | Committee of experts | Market of customers |
| Success metric | Novel contribution | People pay for it |
| Timeline | Months to years | Weeks to months |
| Iteration speed | Slow | Fast |
| Failure handling | Negative result = publishable | Negative result = pivot |
| Primary question | “Is this true?” | “Will people pay for this?” |
The thinking process is almost identical. Master one, you get the other free.
The Researcher Mindset
Before anything else, internalize these:
1. Intellectual honesty above everything Your job is to find truth, not to confirm what you already believe. If your experiment disproves your hypothesis — that is a result. A real one. Publish it.
2. Comfort with uncertainty You will spend most of your time not knowing. That is normal. That is the job. The discomfort of not knowing is the fuel that drives research.
3. Skepticism of everything, including yourself Every paper has flaws. Every methodology has assumptions. Including yours. The best researchers are the harshest critics of their own work.
4. First principles thinking When stuck, strip away everything and ask: what do I actually know for certain? What am I assuming? What would I have to prove for this to be true?
5. The 10x question habit For every conclusion: “Under what conditions would this be false?” For every assumption: “What if the opposite were true?” This is the habit that separates good researchers from great ones.
2. TYPES OF RESEARCH
The 3 Core Types
Type 1 — Exploratory Research
“I don’t know what I don’t know”
Used when: entering a completely new field, starting a new project, trying to understand a domain
Goal: Build a map of the territory. Understand what exists, what the open questions are, who the key players are.
Outputs: Literature review, concept maps, research question candidates
Type 2 — Validating Research
“I think I know something. Let me verify.”
Used when: you have a hypothesis, you want to test an existing claim, you’re checking your assumptions
Goal: Prove or disprove a specific, testable claim with data and evidence
Outputs: Experiment results, validated/invalidated hypothesis, paper
Type 3 — Building Research
“I know what needs to exist. Let me create it.”
Used when: problem is confirmed, solution direction is clear, time to construct and measure
Goal: Build the thing and measure whether it solves the problem
Outputs: System, prototype, product, technical contribution
You always start with Type 1, regardless of how confident you feel.
Research Methodologies in CS
| Methodology | Description | When to Use |
|---|---|---|
| Empirical | Run experiments, collect data, measure | Comparing systems, benchmarking, ML experiments |
| Theoretical | Mathematical proofs, formal analysis | Algorithm complexity, security proofs |
| System Building | Design, implement, evaluate a new system | New tools, frameworks, architectures |
| Survey/Review | Comprehensive analysis of existing work | Literature reviews, meta-analyses |
| Case Study | Deep analysis of specific instances | Real-world system behavior, deployment studies |
| User Study | Observe/interview people using systems | HCI, usability, accessibility |
| Simulation | Model and simulate systems | Networks, distributed systems, scenarios |
Most CS research combines at least 2 of these.
3. ACADEMIC RESEARCH — FULL PROCESS
3.1 Finding Your Research Question
This is where 90% of students fail. They pick a topic, not a question.
| Wrong (topic) | Right (question) |
|---|---|
| “I want to research machine learning” | “Why do transformers underperform on time-series vs LSTMs, and can positional encoding be modified to fix this?” |
| “I want to study databases” | “What is the performance cost of ACID compliance in distributed transactions at scale, and can it be reduced without sacrificing consistency?” |
| “I want to work on security” | “Are current LLM-based code generation tools producing systematically vulnerable code in specific categories?” |
The Anatomy of a Good Research Question
[Observed gap or problem] + [specific context] + [proposed direction]
A good research question must be: - Specific — not vague, clearly bounded - Falsifiable — you must be able to be wrong - Novel — not already answered (do your literature review first) - Feasible — answerable within your resources and time - Significant — the answer must matter to someone
Method 1 — The Survey Paper Method
Survey papers (also called review papers) summarize an entire subfield. They are treasure maps.
Steps: 1. Search “[your area] survey” or “[your area] review” on Google Scholar 2. Filter for recent papers (last 3-5 years) 3. Go directly to “Future Work,” “Open Problems,” or “Limitations” section 4. Those sections are researchers explicitly saying: “someone please solve this” 5. Pick one that genuinely interests you and that you have the skills to attempt
Method 2 — The Contradiction Method
- Find two papers in the same area that reach different conclusions
- The gap between them IS a research question
- “Paper A claims X works under conditions Y. Paper B says it doesn’t. Under what specific conditions does each hold, and why?”
This is powerful because the disagreement already signals there is something real to discover.
Method 3 — The Real World Backwards Method
- Start with a real problem you or someone you know has experienced
- Ask: “Why does this problem exist?”
- Keep asking why (5 Whys technique) until you hit something no paper has answered
- That unanswered “why” is your research question
This method produces the most impactful research because it is grounded in reality. It also happens to be the same starting point for a startup.
Method 4 — The Replication + Extension Method (Best for Beginners)
- Find a well-cited paper in your area
- Replicate their experiment (this teaches you the methodology deeply)
- Identify one assumption they made or one thing they didn’t test
- Your extension of that = your contribution
This is underrated. Many great papers are rigorous extensions of prior work.
3.2 Literature Review
What a Literature Review Is Not
- It is not a list of paper summaries
- It is not a history lesson about the field
- It is not copy-pasting abstracts
What a Literature Review Actually Is
A literature review is a structured argument that: 1. Shows you understand the existing landscape 2. Identifies the specific gap your work fills 3. Explains why existing approaches don’t fully solve the problem
Think of it as a prosecution brief: you are building the case that your research needs to exist.
The Literature Review Process
Phase 1: Seed Start with 3-5 highly relevant papers you already know or find immediately.
Phase 2: Expand For each seed paper: - Note every paper it cites that seems relevant - Search Google Scholar “Cited by” to find everything that cited it - You now have 30-50 papers
Phase 3: Filter Apply the 3-Pass reading method (see 3.4) to filter down to what actually matters.
Phase 4: Cluster Group papers by subtopic, approach, or contribution type. This becomes your review structure.
Phase 5: Synthesize Write the review not as “Paper A did X. Paper B did Y.” but as “The dominant approach to this problem has been X [A, B, C], which works well for [conditions] but fails when [limitation]. An alternative direction is Y [D, E], which addresses [limitation] but introduces [new problem].”
Phase 6: Identify Gap The gap — what none of these papers solve — is where your work lives. State it explicitly.
Connected Papers
Go to connectedpapers.com, paste any paper’s title. It generates a visual graph of related papers, showing you the neighborhood of literature around a topic. Use this for discovery — it surfaces papers you’d never find by keyword search alone.
3.3 Where to Find Papers
Primary Academic Sources
Google Scholar scholar.google.com
├── Start here always. Broadest coverage.
├── Use "Cited by" to trace forward in time
└── Use date filter to get recent work
Semantic Scholar semanticscholar.org
├── AI-powered, shows citation graphs
├── Shows "highly influential citations"
└── Better for finding key papers in a cluster
ArXiv arxiv.org
├── Preprints — papers before peer review
├── CS/ML cutting edge is here first
├── Search by category: cs.LG, cs.AI, cs.DB, cs.CR, etc.
└── Papers appear here 6-12 months before conferences
ACM Digital Library dl.acm.org
├── Core CS research
└── SIGMOD, STOC, CCS, CHI, SOSP, OSDI etc.
IEEE Xplore ieeexplore.ieee.org
├── Engineering + CS
└── ICSE, INFOCOM, S&P, CVPR etc.
Papers With Code paperswithcode.com
├── ML papers + working code implementations
├── Benchmarks and leaderboards
└── State of the art tracking per task
DBLP dblp.org
├── CS bibliography database
└── Great for finding all papers by a specific author
ResearchGate researchgate.net
└── Researchers often post their own PDFs here
Getting Paywalled Papers — Legally and Free
1. Unpaywall browser extension
Install it. It auto-finds free legal versions of papers.
Works 50-60% of the time.
2. ArXiv preprint version
Search the paper title on arxiv.org — most CS papers have a preprint.
3. Author's personal/lab website
Researchers post their own papers. Google "[author name] publications"
4. Email the author directly
"Dear Prof. X, I'm a CS student studying [topic].
Could you share a copy of your paper [title]?
I don't have institutional access."
Response rate: ~70%. They almost always say yes.
Bonus: sometimes turns into a research relationship.
5. Google Scholar PDF links
Click the [PDF] link on the right side of Google Scholar results.
Many link to free versions.
Key Conferences by CS Subfield
| Subfield | Top Conferences |
|---|---|
| Machine Learning | NeurIPS, ICML, ICLR |
| Computer Vision | CVPR, ICCV, ECCV |
| NLP | ACL, EMNLP, NAACL |
| Systems | OSDI, SOSP, ATC, EuroSys |
| Databases | SIGMOD, VLDB, ICDE |
| Security | S&P (Oakland), CCS, USENIX Security, NDSS |
| Networking | SIGCOMM, NSDI |
| HCI | CHI, UIST |
| Software Engineering | ICSE, FSE, ASE |
| Theory | STOC, FOCS, SODA |
A paper at these venues = peer-reviewed, high bar, trustworthy.
3.4 Reading Papers Efficiently
The 3-Pass Method
(Adapted from Prof. S. Keshav’s canonical framework)
Pass 1 — Bird’s Eye View (5–15 minutes)
Read only: - Title and abstract - Introduction: first paragraph + last paragraph only - All section headings - Conclusion - Skim all figures and tables (read their captions)
After Pass 1, answer: 1. What problem does this solve? 2. What is their core approach? 3. Did it work? (main result) 4. Is this relevant to my work?
80% of papers end here. That is correct and efficient.
Pass 2 — Understanding (30 minutes – 1 hour)
Read everything except proofs and deep math. Understand the core idea. Mark every reference you don’t recognize — those are papers to possibly read next. Understand all figures.
After Pass 2, you should be able to explain the paper to someone else in 3 minutes.
Pass 3 — Deep Reading (3–5 hours, critical papers only)
Read everything. Understand every proof, every design decision, every experiment. Ask: could you reimplement this? Could you find a flaw? Could you extend this?
This level is reserved for the 5 most important papers to your work.
What to Look For in Different Sections
| Section | Key Questions to Ask |
|---|---|
| Abstract | What claim are they making? Is it specific? |
| Introduction | What is the precise problem? What are they NOT solving? |
| Related Work | What is the gap they position against? |
| Methodology | What assumptions are they making? What could fail? |
| Experiments | Are baselines fair? Are datasets appropriate? |
| Results | Are differences statistically significant? |
| Conclusion | What do they claim vs what did they actually prove? |
| Limitations | Are they honest? What did they hide here? |
Red Flags in Papers
- No statistical significance testing on results
- Comparison only against weak baselines
- Dataset that suspiciously fits their method
- “Future work” section that addresses obvious flaws
- Cherry-picked examples without aggregate metrics
- No ablation study (testing each component separately)
- Results that are “too clean”
3.5 Organizing Your Research
Folder Structure (Use This Exactly)
/research-project-name/
│
├── /papers/
│ ├── /foundational/ → papers everyone cites, must-reads
│ ├── /related-work/ → papers in your topic area
│ ├── /methods/ → papers about techniques you use
│ └── /datasets/ → papers about datasets you use
│
├── /notes/
│ ├── /paper-summaries/ → one file per paper
│ ├── /ideas/ → your original thoughts
│ ├── /meeting-notes/ → advisor meetings
│ └── /weekly-logs/ → weekly research diary
│
├── /experiments/
│ ├── /exp-001/ → one folder per experiment
│ │ ├── config.yaml → exact parameters used
│ │ ├── run.py → code to reproduce
│ │ ├── results/ → raw outputs
│ │ └── README.md → what this tested and what happened
│ └── /exp-002/
│
├── /writing/
│ ├── /drafts/ → version-controlled LaTeX
│ ├── /figures/ → all paper figures
│ └── /references.bib → Zotero-managed bibliography
│
├── /data/
│ ├── /raw/ → original, never modified
│ ├── /processed/ → cleaned versions
│ └── data_sources.md → where everything came from
│
├── /code/
│ ├── requirements.txt → exact package versions
│ ├── README.md → how to run everything
│ └── /src/
│
└── README.md → project overview, status, how to navigate
Everything in Git. From day 1. No exceptions.
Zotero Setup (Your Research Database)
Zotero is free and the best tool for managing papers.
Setup steps: 1. Install Zotero desktop + browser extension 2. Create
collections by project and subtopic 3. When you find a paper online,
click the browser extension — it imports automatically 4. Add tags
consistently: to-read, read-pass1,
read-pass2, read-pass3, key-paper
5. Write your notes in the Zotero note attached to each paper 6. Use
Zotero’s citation tool in Word or Overleaf — never manually format
references
The single biggest time-saving habit in academic research.
3.6 Note-Taking Systems
The Zettelkasten Method
Developed by sociologist Niklas Luhmann, who published 70 books and 400 articles using this method. The system works because it mirrors how ideas actually connect — not hierarchically, but as a network.
Three types of notes:
1. Fleeting Notes Raw, immediate capture. Any format. No editing. Write these during reading, conversations, showers. Process them within 24 hours or they become clutter.
Example:
"Paper X claims attention is all you need for sequence modeling —
but what about explicit memory? Is positional encoding actually
enough for tasks that require remembering state 10,000 tokens ago?"
2. Literature Notes Your summary of one source, in your own words. Never copy-paste. Forcing yourself to restate in your own words tells you immediately if you understood it.
Template:
Source: [Author, Year, Title]
Main argument:
How they argue it:
Key evidence/data:
What I find interesting:
What I disagree with:
How it connects to my work:
3. Permanent Notes Your original ideas, fully formed, connected to other notes. These are the actual atoms of your research.
Example:
NOTE-042: Positional Encoding Fails for Long-Range Dependencies
The standard sinusoidal positional encoding in transformers (NOTE-031)
encodes absolute position, not relative importance. This explains the
degradation I observed in EXP-007 for sequences > 2048 tokens.
Potential fix: learnable relative positional bias (see NOTE-038 on ALiBi).
This might resolve the failure mode in [my project name].
→ Links: NOTE-031, NOTE-038, EXP-007
The rule: one idea per note. Link aggressively.
Obsidian for Zettelkasten
Obsidian is local-first markdown notes with bidirectional linking. It is the best tool for Zettelkasten.
Key features to use: - [[note name]] syntax creates
links between notes - Graph view shows the visual network of your ideas
- Dataview plugin lets you query your notes like a database - Daily
notes for fleeting notes - Canvas for visual thinking and mapping
3.7 Forming a Hypothesis
A hypothesis is not a guess. It is a precise, testable prediction derived from your understanding of the existing literature.
Anatomy of a Good Hypothesis
"If [specific action/change], then [specific measurable outcome],
because [mechanistic reason based on prior knowledge]."
Example: > “If we replace absolute positional encoding with ALiBi (Attention with Linear Biases) in standard transformer language models, then perplexity on sequences longer than 2048 tokens will decrease by at least 15%, because ALiBi encodes relative distance rather than absolute position, which is more informative for long-range dependencies.”
Notice: - Specific action: replace absolute with ALiBi - Measurable outcome: perplexity, by at least 15% - Mechanistic reason: relative vs absolute position encoding
Null Hypothesis
Always state the null hypothesis too:
“ALiBi positional encoding provides no statistically significant improvement over standard positional encoding for sequences longer than 2048 tokens.”
Your experiment is designed to reject (or fail to reject) the null hypothesis. This framing keeps you honest.
Assumption Mapping
Before running any experiment, list every assumption your hypothesis rests on:
ASSUMPTION 1: The model architecture is otherwise identical
ASSUMPTION 2: The dataset contains sufficient long sequences to measure this
ASSUMPTION 3: Perplexity is the right metric for this comparison
ASSUMPTION 4: The difference in the encoding is the actual causal factor
For each assumption, ask: what if this is wrong? How does that affect my conclusion?
3.8 Designing Experiments
The experiment design is where most CS papers live or die. A brilliant idea with a poor experiment produces an unreliable result.
The Components of a Well-Designed Experiment
1. Baseline What are you comparing against? The baseline must be: - Fair — same resources, same data, same conditions - Strong — use the best existing method, not a weak one - Relevant — actually what practitioners would use
The most common criticism in peer review: “The baseline is too weak.”
2. Dataset - Is it the right dataset for your claim? - Is it publicly available? (reproducibility) - Are there known biases in it that could skew results? - Do you need train/validation/test splits? Are they contamination-free?
3. Metrics Choose metrics that actually measure what you claim to improve.
| Claim | Wrong Metric | Right Metric |
|---|---|---|
| “Faster” | Wall clock time on your laptop | Throughput on standardized hardware, multiple runs |
| “More accurate” | Accuracy on balanced dataset | F1 score if classes are imbalanced |
| “More efficient” | Lines of code | FLOPs, memory usage, inference latency |
| “Better user experience” | Your gut feeling | User study with proper statistical design |
4. Controls Hold everything constant except the variable you’re testing. This sounds obvious. It is constantly violated.
5. Ablation Studies Test each component of your system independently. If you added 3 things and got improvement, which of the 3 actually did it?
Remove each component one at a time and measure. This is what separates “it works” from “we understand why it works.”
6. Statistical Significance Never report a single run. Run multiple times. Report mean ± standard deviation. Use appropriate statistical tests (t-test, Wilcoxon, etc.). Results that don’t survive statistical testing are not results.
Experiment Design Template
EXPERIMENT ID: EXP-[number]
DATE:
HYPOTHESIS BEING TESTED: [which hypothesis, state it]
WHAT I CHANGED: [the independent variable]
WHAT I MEASURED: [the dependent variable(s)]
WHAT I HELD CONSTANT: [controls]
DATASET: [name, version, split sizes]
BASELINE: [what I'm comparing against]
METRICS: [exactly what numbers I will collect]
NUMBER OF RUNS: [minimum 3, ideally 5+]
EXPECTED RESULT: [what I predict before running]
3.9 Running Experiments
Reproducibility — Non-Negotiable
Every experiment must be reproducible. This means:
# Set all random seeds at the top of every experiment
import random, numpy as np, torch
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)Log every hyperparameter. Use config files (YAML or JSON), not hardcoded values. The experiment log should contain everything needed to reproduce the result from scratch.
Experiment Tracking Tools
Weights & Biases (wandb) — best for ML experiments - Automatic logging of metrics, hyperparameters, system stats - Visual dashboards - Compare runs side by side - Free for academics
MLflow — open source alternative - Local or cloud - Experiment tracking + model registry
Sacred + Omniboard — lightweight alternative
Minimum viable tracking (no tools): Every experiment
gets a folder with: - config.yaml — all parameters -
run.sh — exact command to reproduce - results/
— all outputs - notes.md — what you observed
During the Experiment — What to Log
- All hyperparameters (learning rate, batch size, epochs, etc.)
- Environment (Python version, package versions, hardware)
- Random seeds
- Training curves (loss, metrics at each epoch)
- Final metrics (mean + std across runs)
- Wall clock time
- Memory usage
- Any anomalies or unexpected behavior
3.10 Analyzing Results
Getting results is not the end. Understanding them is.
The Three Questions for Every Result
1. Does it match my hypothesis? If yes: great, but ask “why does it work?” — don’t just accept it If no: even better — ask “why not?” This is where discoveries live
2. Are there patterns I didn’t expect? Look at failure cases, outliers, edge cases. These are often more interesting than the main result.
3. What alternative explanations exist? Could something other than my proposed mechanism explain this result? This is the hardest question. It’s also what reviewers will ask. Answer it before they do.
Error Analysis
For any classification/prediction task: don’t just look at aggregate accuracy. Look at where your system fails.
- What types of examples does it get wrong?
- Is there a systematic pattern to the failures?
- Does it fail on certain conditions, domains, or data distributions?
Error analysis often points directly to your next hypothesis.
Ablation Analysis Checklist
□ Tested with each component removed individually
□ Tested with combinations of components removed
□ Identified which component contributes most to improvement
□ Identified which component is necessary vs nice-to-have
□ Tested on both easy and hard subsets of the data
□ Tested across at least 2 different datasets if possible
3.11 Writing the Paper
The Structure of a CS Research Paper
TITLE
└── Should contain: the problem + the approach or result
Bad: "A New Method for Text Classification"
Good: "Sparse Attention for Efficient Long-Document Text Classification"
ABSTRACT (150-250 words)
└── Write this LAST
Paragraph 1: Problem and why it matters
Paragraph 2: What you did
Paragraph 3: Key result (include the number)
No citations. No jargon without definition.
INTRODUCTION (1-2 pages)
└── Para 1: The big picture problem
Para 2-3: Current approaches and their limitations
Para 4: What you do differently (your approach, high level)
Para 5: Key results (spoil them — this is not a mystery novel)
Final para: "Our contributions are: (1)... (2)... (3)..."
RELATED WORK (0.5-1 page)
└── Group related papers by approach, not chronologically
Don't trash other work. Use: "orthogonal to," "complementary to"
Make clear how yours differs from each group
METHODOLOGY / APPROACH (2-4 pages)
└── The most important section technically
Someone else must be able to reproduce your work from this alone
Include: problem formulation, architecture/algorithm, design decisions
Use diagrams generously — a figure is worth 200 words
EXPERIMENTS (2-3 pages)
└── Setup: datasets, baselines, metrics, implementation details
Main results: your primary comparison table/figure
Ablation studies: what each component contributes
Analysis: qualitative examples, error analysis, interesting cases
DISCUSSION (0.5-1 page)
└── Why do results look the way they do?
What surprised you?
What does this mean for the broader field?
Honest limitations
CONCLUSION (0.5 page)
└── What did you find? (restate main result)
What are the limitations?
What is the most important future work?
REFERENCES
└── Use Zotero. Never manually format.
Follow the venue's citation style exactly.
Writing Process — The Actual Steps
Step 1: Write the methodology first You know this best. Get it on paper (or LaTeX). It unlocks the rest.
Step 2: Write the experiments section You have the results. Describe the setup, present the numbers, explain what they show.
Step 3: Write the introduction Now that you know what you did and what you found, you can properly set it up.
Step 4: Write related work You’ve done the literature review. Structure it as an argument for why your work was needed.
Step 5: Write the discussion and conclusion Reflect on what it all means.
Step 6: Write the abstract The entire paper in 200 words. Write this last.
Step 7: Write the title Sounds trivial. Not trivial. The title determines who reads your paper.
LaTeX — Learn It
Every serious CS paper is written in LaTeX. The learning curve is 1 week. It pays back in every paper for the rest of your career.
% Basic paper structure
\documentclass[10pt, conference]{IEEEtran}
\usepackage{amsmath, graphicx, booktabs, hyperref}
\title{Your Paper Title}
\author{Your Name}
\begin{document}
\maketitle
\begin{abstract}
Your abstract here.
\end{abstract}
\section{Introduction}
\section{Related Work}
\section{Methodology}
\section{Experiments}
\section{Conclusion}
\bibliography{references}
\bibliographystyle{IEEEtran}
\end{document}Use Overleaf — online LaTeX editor with real-time collaboration. Free for individuals.
Writing Principles
1. One paragraph = one idea State the idea in sentence 1. Support it. No exceptions.
2. Active voice “We propose a method that…” not “A method is proposed that…”
3. Precision over elegance “Our method reduces latency by 23% on the X benchmark” not “Our method significantly speeds things up”
4. Every claim needs evidence “X is slow [citation]” or “X is slow (Table 2)” — if you can’t back it, don’t claim it
5. Write drunk, edit sober First draft = get ideas out, don’t judge. Second draft = structure and clarity. Third draft = precision and polish. Never try to do all three at once.
3.12 Peer Review + Submission
Choosing a Venue
Tier 1 venues: NeurIPS, ICML, ICLR, CVPR, ACL, SIGMOD, SOSP, S&P (security), CHI Tier 2 venues: Solid conferences in each area — AAAI, IJCAI, CIKM, NDSS, USENIX, etc. Journals: TPAMI, VLDB Journal, JACM, TDSC — slower, more thorough review Workshops: Good for early work, getting feedback, networking
For your first paper: aim for the best workshop or solid Tier 2 conference. Get the experience. Then aim higher.
Submission Checklist
□ Paper fits within page limit
□ All figures are readable at print size
□ All claims are backed by evidence or citation
□ Ablation studies are included
□ Baselines are fair and strong
□ Statistical significance is reported
□ Code will be released (say this in the paper)
□ Anonymous version for double-blind review (remove author names, institution)
□ Supplementary material prepared if needed
□ Camera-ready deadline noted for when accepted
Reading Reviews
When you get reviews back: 1. Wait 24 hours before responding. Never respond emotionally. 2. Read all reviews together — look for patterns across reviewers 3. List every concern: (a) they’re right and I need to fix it, (b) they misunderstood and I need to clarify, (c) they’re wrong and I need to argue why
Reviewers are almost always right about what confused them, even if wrong about why.
The Rebuttal
Address every concern. Be concise. Be factual. New experiments > argument.
“We thank Reviewer 2 for this concern. We ran additional experiments on Dataset X with the configuration they suggested (Appendix B). The results show Y, which is consistent with our main claims.”
4. STARTUP/PRODUCT RESEARCH — FULL PROCESS
The Mindset Shift
In academic research, you convince a committee of experts. In startup research, you convince a market of customers.
The process is the same. The jury changes.
The #1 rule of startup research:
Talk to people before you write code. Every time. No exceptions.
Most failed startups built something nobody wanted. Not because they were bad engineers. Because they never verified the problem was real and widespread before building the solution.
4.1 Problem Discovery
The Three Sources of Real Problems
Source 1 — Your Own Life (Best Source) Problems you personally hit repeatedly. You are your own first customer — you understand the pain intimately, you know the context, you know what good looks like.
Exercises: - List every task you do repeatedly that feels stupid or inefficient - List every tool you use that constantly frustrates you - List every time in the last month you thought “why doesn’t this exist?”
Source 2 — Jobs, Internships, Observations Inefficiencies you saw working somewhere. Industries still using Excel for everything. Processes that take 4 people that should take a script. Manual work that is clearly automatable.
The pattern to look for: high-value, recurring, manual process.
Source 3 — Research Paper Gaps You are a CS student. This is your unfair advantage over most founders.
Academic research is 5-10 years ahead of commercial products. There are solutions in papers that nobody has productized. Find those.
Examples: - Transformers existed in research for years before OpenAI commercialized them - Semantic search existed in research long before it appeared in consumer products - Privacy-preserving computation is researched heavily; commercial products are just emerging
The 5 Whys for Problem Discovery
Ask “why?” five times to reach the root cause:
SURFACE: "Our customer support takes 3 days to respond"
Why? → Too many tickets
Why? → Users can't find answers in documentation
Why? → Documentation is scattered and unstructured
Why? → No single source of truth, every team writes their own
Why? → No tooling that creates documentation from actual product behavior
ROOT PROBLEM: There's no automated way to generate accurate, current
documentation from product behavior → that's a startup idea
4.2 Market Research
TAM / SAM / SOM Framework
TAM (Total Addressable Market)
└── Everyone who could ever buy this
"All companies that have customer support"
Don't need to pursue this. Just need it to be large enough.
SAM (Serviceable Addressable Market)
└── Who you can realistically reach with your model
"B2B SaaS companies with 10-200 employees in English-speaking markets"
SOM (Serviceable Obtainable Market)
└── Who you'll actually get in year 1-2
"50 B2B SaaS startups you can reach through your network and cold outreach"
You do not need a billion-dollar TAM to build a profitable startup. You need a SOM that pays.
The math that matters:
SOM x Average Revenue Per Customer = Your Revenue Target
50 customers x $500/month = $25,000 MRR = $300,000 ARR
This is a real, fundable, profitable business.
Market Research Process
Step 1: Establish that people pay for solutions to this problem today If there is zero existing spending on this problem — be very suspicious. Either the pain isn’t bad enough, or you haven’t found the right buyer.
Step 2: Find the existing solutions and their flaws The flaws of existing solutions are your opportunity. Not their existence — their flaws.
Step 3: Estimate market size Bottom-up: [number of potential customers] × [price they’d pay] = market size This is more credible than top-down (“the market is $10B”)
Step 4: Identify growth direction Is this market growing, stable, or shrinking? You want growing — you ride the wave instead of fighting for share.
Market Research Tools
FINDING WHAT PEOPLE COMPLAIN ABOUT
Reddit (r/[industry], r/entrepreneur, r/smallbusiness)
→ search "[product name] sucks" or "[category] alternative"
Twitter/X
→ "[industry] + (hate OR frustrating OR finally)" searches
G2 / Capterra / Trustpilot
→ competitor reviews — read the 2-3 star reviews
these are people who paid, tried, and were disappointed
that gap = your product
Product Hunt comments
→ what do people ask for that doesn't exist?
Hacker News
→ search news.ycombinator.com for "pain" or "Ask HN: what SaaS"
MARKET SIZING + COMPETITIVE LANDSCAPE
Crunchbase → funding in this space (validates market exists)
SimilarWeb → competitor traffic estimates
LinkedIn → how many people have job title X (market size proxy)
Google Trends → is interest growing or dying?
SEMrush / Ahrefs → keyword search volume (how many search for the problem)
Statista → industry reports (some free)
GEOGRAPHIC + DEMOGRAPHIC SIGNALS
SparkToro → where does your audience spend time online
Facebook Audience Insights → demographic data
LinkedIn Sales Navigator → precise B2B targeting data
4.3 Customer Discovery
This is the most important step. Most founders skip it or do it badly.
The Mom Test
From Rob Fitzpatrick’s book The Mom Test — the most important startup book you will read.
The core insight: if you ask “would you use this?” everyone says yes (including your mom, who loves you). You need questions that people can’t lie to you about even if they want to.
Questions that get honest answers:
"Tell me about the last time you experienced [problem]."
→ Real, specific, past behavior. Not hypothetical.
"Walk me through how you handle [problem] today."
→ Reveals actual workflow, pain points, existing solutions
"How much time/money does this cost you right now?"
→ Quantifies pain. Low numbers = low priority = bad sign.
"What have you tried? Why didn't it work?"
→ Reveals solution landscape and failure modes
"Who else has this problem in your organization?"
→ Reveals buying process and scope
"What would you do if [tool they use] disappeared tomorrow?"
→ Reveals true dependency and switching costs
Questions that get lies (avoid these):
"Would you use an app that does X?" → "Yes" (to be polite)
"Do you think this is a good idea?" → "Yes" (to encourage you)
"Would you pay $X/month for this?" → "Probably" (hypothetical)
"How much would you pay for this?" → Random number with no commitment
How to Get Customer Interviews
Cold LinkedIn messages:
"Hi [Name], I'm a CS student researching [problem area].
I'm not selling anything — just trying to understand how
[job title] currently handles [problem].
Would you have 15 minutes for a quick call?
I'll share what I learn with you."
Response rate: 15-30% when this is genuine.
Warm network: - Professors who know practitioners - Family friends in relevant industries - Previous internship contacts - Alumni from your university
Online communities: - Find the subreddit or Slack community for your target user - Participate genuinely for 2 weeks before asking - Post: “I’m researching [problem]. Would anyone share their experience with [problem]? DM me.”
Minimum viable discovery: 20 conversations before writing any code.
The Interview Structure
OPENING (2 min)
"Thanks for your time. I'm trying to understand how [role]
handles [problem area]. There's no product to sell — just learning.
Mind if I take notes?"
CONTEXT (5 min)
"Tell me about your role and your team."
"What tools do you use for [area]?"
CORE QUESTIONS (20 min)
"Tell me about the last time [problem] happened."
"Walk me through exactly what you do."
"What's the most painful part?"
"What have you tried? What failed?"
"What does it cost you — time, money, frustration?"
CLOSING (3 min)
"Is there anything I didn't ask that you think I should know?"
"Who else should I talk to about this?"
Reading the Signals
| Signal | What It Means |
|---|---|
| “I do this manually every Monday” | Real, recurring pain |
| “We pay $X/month for [bad solution]” | Proven willingness to pay |
| “I’ve tried 3 tools, none work” | Active problem, no good solution |
| “When can I use it?” | Strong pull signal |
| “I’ve actually built a workaround” | Very strong — they care enough to build |
| “That sounds interesting” | Polite disinterest |
| “I guess I’d use it” | Not a real problem for them |
| “Depends on the price” | Low pain, price sensitive |
4.4 Competitive Analysis
Never say “we have no competition.” Every problem has a current solution — manual process, spreadsheet, internal tool, or adjacent product.
The Competitive Analysis Framework
For every competitor or substitute:
PRODUCT ANALYSIS
Name + URL:
What does it do? (1 sentence)
Who is the target customer?
What problem does it solve best?
What does it NOT do well?
What do users complain about most? (read the reviews)
BUSINESS ANALYSIS
Pricing:
Revenue estimates (SimilarWeb + funding ÷ growth):
Funding history (Crunchbase):
Team size (LinkedIn):
How long have they existed?
TECHNICAL ANALYSIS
What stack do they use? (BuiltWith, Wappalyzer)
Any obvious technical limitations?
Is their core IP easy to replicate?
YOUR DIFFERENTIATION
Why will customers switch from or choose you over them?
What is your specific unfair advantage?
The 2×2 Positioning Map
Draw two axes that matter in your market. Examples: - Price vs Ease of Use - Power/Features vs Speed to Value - Enterprise vs SMB, Automated vs Manual
Place every competitor. Find the empty quadrant. That is your position.
The G2/Capterra Review Mining Method
This is the most underused startup research technique:
- Find your top 3 competitors on G2 or Capterra
- Filter reviews: 2 and 3 stars only (these are paying customers who are disappointed)
- Copy every complaint into a spreadsheet
- Tag by theme: “too expensive,” “missing feature X,” “bad UX,” “poor support”
- Find the themes that repeat most often
The most common complaints = your product roadmap.
4.5 Validation Before Building
The Validation Ladder (cheapest to most expensive)
LEVEL 1: Smoke Test (2-3 days)
Build a landing page. Describe the product. Add signup/waitlist.
Measure: what % of visitors sign up?
Benchmark: 5%+ is a good signal
LEVEL 2: Concierge MVP (1-2 weeks)
Manually do what the product would do for 5-10 customers.
No code — just you doing the work with spreadsheets, email, manual process.
Measure: will they pay for this manual version?
This tells you if the value is real before any engineering.
LEVEL 3: Wizard of Oz (2-4 weeks)
Front-end looks like a real product. Back-end is you doing things manually.
Customer sees an interface. You are behind the curtain.
Measure: engagement, return rate, qualitative feedback.
LEVEL 4: Pre-sell (any time)
Ask for money before it's built. "Pay $X now, get access in 8 weeks."
If they won't give money, they don't really want it.
Measure: actual payment = strongest possible signal.
LEVEL 5: MVP (4-8 weeks)
Simplest possible working version. One use case. No extras.
Measure: activation, retention, NPS, payment.
The cardinal rule: move up the ladder only when the current level gives you signal.
Landing Page That Converts
A good validation landing page has:
HEADLINE → What you do + who it's for (10 words max)
SUBHEADLINE → Specific benefit + the pain it solves
SOCIAL PROOF → Logos, quotes, numbers (even if early)
HOW IT WORKS → 3 simple steps
CTA BUTTON → "Join Waitlist" or "Get Early Access" or "Buy Now"
SECONDARY CTA → "See a demo" or "Watch video"
Tools: Carrd (simplest), Framer (most beautiful), Webflow (most flexible)
Pricing Experiments
Run pricing tests early. Most founders underprice.
The Van Westendorp pricing model: Ask these 4 questions in your customer interviews:
1. At what price would this be so cheap you'd question the quality?
2. At what price would this be a bargain — great value?
3. At what price would this start to get expensive but you'd still buy?
4. At what price would this be too expensive?
The “acceptable range” is between Q2 and Q3. The sweet spot is Q3. Most founders price at Q1.
4.6 Continuous Product Research
After you’ve built and launched, research does not stop. It transforms.
The Continuous Discovery Framework
(Teresa Torres — read her book)
WEEKLY RHYTHM
├── 1 customer interview per week (minimum, forever)
├── Review analytics dashboard
├── Check support tickets for patterns
└── Team sync: what did we learn this week?
MONTHLY RHYTHM
├── Opportunity mapping: what new needs are emerging?
├── Assumption audit: which of our beliefs might be wrong?
├── Competitive check: what changed in the landscape?
└── Metrics review: are we moving the right numbers?
The Opportunity Solution Tree
DESIRED OUTCOME (what metric are we trying to move?)
↓
OPPORTUNITIES (what user needs, pain points, desires exist?)
↓
SOLUTIONS (what could we build to address each opportunity?)
↓
EXPERIMENTS (what's the smallest test of each solution?)
This tree prevents the most common product mistake: jumping from “user has problem” directly to “let’s build a feature” without asking “is this the right solution?”
4.7 Metrics and Analytics
The AARRR Framework (Pirate Metrics)
| Stage | Metric | Question |
|---|---|---|
| Acquisition | Visitors, signups | Where do users come from? |
| Activation | % who complete onboarding, time-to-value | Do they get value in session 1? |
| Retention | DAU/WAU/MAU, churn rate | Do they come back? |
| Revenue | MRR, ARPU, LTV | Are they paying? Upgrading? |
| Referral | NPS, referral rate, viral coefficient | Do they tell others? |
Track all five from day 1. Most startups only track acquisition.
The North Star Metric
One metric that best captures the value you deliver to customers. All other metrics serve this one.
Examples:
Airbnb → Nights booked
Spotify → Time spent listening
Slack → Messages sent
WhatsApp → Messages sent
Your startup → ???
Ask: “If this number goes up, are our customers definitely getting more value?” If yes — that’s your North Star.
Instrumentation — What to Track From Day 1
# Every user action that matters should fire an event
analytics.track('user_id', 'event_name', {
'property_1': value,
'property_2': value,
'timestamp': datetime.now()
})
# Key events to track:
# - User signed up
# - User completed onboarding
# - User performed core action (whatever your product does)
# - User hit paywall
# - User upgraded/paid
# - User invited someone
# - User churned (deleted account / cancelled)Tools: PostHog (open source, self-host), Mixpanel, Amplitude
5. DOCUMENTATION SYSTEMS
5.1 Academic Documentation
The Research Journal
Keep this daily. 10 minutes maximum. The habit beats the duration.
## Research Log — [DATE]
### What I did today
[1-3 sentences]
### Key insight or finding
[The most important thing I learned]
### Questions this raised
[What I'm now wondering]
### Experiment or action for tomorrow
[One concrete next step]
### Mood/Energy
[1-5 — tracking this shows patterns in your best work days]Paper Summary Card Template
## Paper Summary
**Title:** [Full title]
**Authors:** [Author list]
**Year:** [Year]
**Venue:** [Conference/Journal]
**Link:** [URL or DOI]
**Zotero Key:** [your key]
---
### The Problem
[What problem does this paper solve? 2-3 sentences]
### Why Existing Solutions Were Insufficient
[What did prior work fail to do?]
### Their Approach
[The core method, in your words. Not a copy-paste of the abstract.]
### Key Insight
[The one clever idea that makes this work]
### Main Results
[Key numbers: accuracy, speedup, improvement over baseline]
### Limitations
[What doesn't this handle? What did they admit was missing?]
### My Critical Assessment
[Do I believe the results? Are the experiments convincing? What would I challenge?]
### Relevance to My Work
[How does this affect my research? Does it support or contradict my hypothesis?]
### Follow-up Papers to Read
[Papers from references I should track down]Experiment Log Template
## Experiment Log
**Experiment ID:** EXP-[number]
**Date:** [date]
**Status:** [planned / running / completed / abandoned]
---
### Hypothesis
[If I do X, then Y will happen because Z]
### Setup
- Dataset: [name, version, statistics]
- Model/System: [exact configuration]
- Baseline: [what I'm comparing against]
- Metrics: [what I'm measuring]
- Hardware: [GPU/CPU, RAM, etc.]
- Random Seed: [number]
- Key Hyperparameters: [list]
### Command to Reproduce
```bash
python run_experiment.py --config configs/exp-[number].yamlResults
| Metric | Baseline | Our Method | Delta |
|---|---|---|---|
| [Metric 1] | [value] | [value] | [%] |
Analysis
[Why did the results look this way? What patterns did I notice?]
Conclusion
[Does this support or reject the hypothesis?]
Next Experiment
[What does this result suggest I should test next?]
### Meeting Notes Template (Advisor Meetings)
```markdown
## Advisor Meeting — [DATE]
**Attendees:** [names]
**Duration:** [time]
---
### What I Presented
[What I showed the advisor]
### Key Feedback
[What they said — be specific, use quotes where possible]
### Things to Do Before Next Meeting
- [ ] [Action 1]
- [ ] [Action 2]
- [ ] [Action 3]
### Questions I Forgot to Ask (for next time)
[...]
### My Reflection
[What was most useful? What surprised me?]
5.2 Startup Documentation
The Opportunity Document (Living Document)
Update this every week. This is your compass.
# Opportunity Document — [Product Name]
**Last Updated:** [date]
**Version:** [number]
---
## Problem Statement
**The problem:**
[1-2 sentences, specific, no jargon]
**Who has it:**
[Specific person: job title, company type, size, context]
**How often it occurs:**
[Daily / weekly / monthly — frequency matters]
**How bad it is:**
[Time cost, money cost, emotional cost]
---
## Evidence We've Collected
**Interviews conducted:** [number]
**Verbatim quotes from customers:**
> "[Exact quote from interview — use their words, not yours]" — [Role, Company Size]
> "[Another exact quote]" — [Role, Company Size]
**Data points:**
- [Market data point]
- [Behavioral observation]
---
## Current Solutions and Their Failures
| Solution | Who Uses It | What They Like | What They Hate |
|---|---|---|---|
| [Competitor A] | [customer type] | [strengths] | [weaknesses] |
| [Competitor B] | [customer type] | [strengths] | [weaknesses] |
| Manual process | [customer type] | [control] | [time, error] |
---
## Our Hypothesis
**What we will build:**
[1 sentence]
**Why it will be better:**
[Specific mechanism — not "easier to use" but "eliminates the X step that takes 2 hours"]
---
## Assumptions (Ranked by Risk)
| # | Assumption | Risk Level | How to Test |
|---|---|---|---|
| 1 | [Most uncertain belief] | High | [Experiment] |
| 2 | [Next uncertain belief] | Medium | [Experiment] |
---
## Experiments Done
| Date | Hypothesis Tested | Method | Result | Decision |
|---|---|---|---|---|
| [date] | [what we tested] | [how] | [what happened] | [what we did next] |
---
## Current Metrics
| Metric | Current | Target | Trend |
|---|---|---|---|
| Waitlist signups | | | |
| Interview requests | | | |
| Landing page CVR | | | |Customer Interview Log
## Interview Log
**Interview #:** [number]
**Date:** [date]
**Duration:** [minutes]
**Interviewer:** [name]
**Participant:**
- Role: [job title]
- Company: [type and size, not name — keep anonymous]
- How we found them: [LinkedIn / referral / community]
---
### The Story They Told
[Narrative summary of what they shared — their situation, the problem, how they handle it]
### Verbatim Quotes (Most Important — Copy Exactly)
> "[Exact words]"
> "[Exact words]"
### Current Solution
[What do they use today to handle this problem?]
### Pain Level: [1-10]
**Why that score:**
[Their reasoning in their words]
### Willingness to Pay Signals
[What they said about money, budgets, what they currently spend]
### Surprises
[What did I not expect? What challenged my assumptions?]
### Follow-up Actions
- [ ] [Something I should research or test based on this]
- [ ] [Someone they suggested I talk to]Decision Log
## Decision Log — [Product Name]
| # | Date | Decision | Options Considered | Why This Choice | Outcome |
|---|---|---|---|---|---|
| 001 | [date] | [what was decided] | [alternatives] | [reasoning] | [fill in later] |5.3 Templates Quick Reference
Weekly Summary (Both Academic and Startup)
## Weekly Summary — Week of [DATE]
### Main question I tried to answer this week
[The central thing you were investigating]
### What I learned
[3-5 bullet points of actual new knowledge]
### What didn't work
[Be honest — failed experiments, dead ends, wrong assumptions]
### Biggest surprise
[The thing that most challenged your existing beliefs]
### What I'm still uncertain about
[Open questions going into next week]
### Next week's main question
[The one thing you will try to answer]
### Metric update (startup) / Progress update (academic)
[Numbers or milestone status]6. TOOLS MASTER LIST
Academic Research Stack
Discovery
Google Scholar scholar.google.com
Semantic Scholar semanticscholar.org
ArXiv arxiv.org
Papers With Code paperswithcode.com
Connected Papers connectedpapers.com
DBLP dblp.org
ACM Digital Library dl.acm.org
IEEE Xplore ieeexplore.ieee.org
AI-Assisted Research Discovery
Perplexity.ai Research assistant that cites sources
Consensus.app AI search across academic papers
Elicit.org AI that extracts structured data from papers
ResearchRabbit Visual paper discovery and tracking
Litmaps Citation mapping and tracking
Paper Management
Zotero Free, open-source, best citation manager
Install desktop app + browser extension
Collections, tags, notes, PDF annotation
Auto-import from browser
Integrates with Word and Overleaf
Reading and Annotation
Zotero PDF reader Built into Zotero, highlights + notes sync
Highlights (Mac) Excellent PDF reader with export to Markdown
Skim (Mac) Lightweight PDF reader
Adobe Acrobat Reader Annotation basics
Note-Taking and Knowledge Management
Obsidian Local markdown notes, bidirectional links
Zettelkasten method
Graph view, Dataview plugin
100% offline, your data stays yours
Notion More structured, better for teams
Roam Research Similar to Obsidian, web-based
LogSeq Open-source Roam alternative
Writing
Overleaf Online LaTeX editor
Real-time collaboration
Free for individuals
Integrates with Zotero
All major conference templates available
VS Code + LaTeX Workshop Local LaTeX writing
Google Docs Early drafts, collaboration
Hemingway App Clarity and readability check
Grammarly Grammar and style
Experiment Tracking
Weights & Biases Best for ML — metrics, hyperparams, model artifacts
MLflow Open-source alternative
Sacred Lightweight Python experiment tracking
DVC Data version control — like Git for datasets
Jupyter Notebooks Document-as-you-go research
Diagrams and Figures
draw.io Free, browser-based, all diagram types
Excalidraw Hand-drawn style, quick sketches
TikZ (LaTeX) Publication-quality figures in papers
Matplotlib / Seaborn Python data visualization
Plotly Interactive visualizations
Inkscape Vector graphics editor (free Illustrator)
Version Control
Git + GitHub Everything — code AND LaTeX papers
GitHub Actions Automate paper builds (compile LaTeX on push)
DVC Large files and datasets
Collaboration and Communication
Overleaf Co-author LaTeX papers in real-time
Slack / Discord Lab communication
Zoom / Google Meet Remote meetings
Calendly Schedule advisor meetings without email ping-pong
Startup Research Stack
Problem Discovery
Reddit reddit.com/r/[your industry]
Search: "[competitor] sucks" or "[category] alternative"
Twitter/X Complaint mining, trend spotting
Product Hunt New products + comment sections
Hacker News news.ycombinator.com — search hn.algolia.com
G2 g2.com — B2B software reviews
Capterra capterra.com — SMB software reviews
Trustpilot Consumer product reviews
Market Research
Crunchbase crunchbase.com — funding, competitors, investors
SimilarWeb similarweb.com — website traffic
Google Trends trends.google.com — search interest over time
Google Keyword Planner Search volume, competition
SEMrush / Ahrefs Keyword research, competitor SEO
SparkToro Where your audience lives online
Statista Industry statistics (some free)
LinkedIn Market sizing by job title / industry
Customer Interviews
Calendly Scheduling (free tier)
Otter.ai AI transcription of calls
Fireflies.ai Auto-transcribe and summarize meetings
Zoom / Google Meet Video calls
Notion / Obsidian Interview logging and synthesis
Validation and Landing Pages
Carrd Simplest landing page (free)
Framer Beautiful landing pages, no code
Webflow Most flexible, no code
Typedream Fast, clean landing pages
Stripe Pre-sell before you build — take real payments
Typeform Surveys and waitlist forms
Tally.so Free Typeform alternative
Analytics
PostHog Open-source, self-host, full analytics + heatmaps
Mixpanel Best-in-class product analytics
Amplitude Enterprise-grade product analytics
Google Analytics Free, for acquisition tracking
Hotjar Heatmaps, session recordings, surveys
FullStory Session replay for debugging UX
Competitive Intelligence
G2 / Capterra Review mining for competitor weaknesses
SimilarWeb Traffic and engagement comparison
BuiltWith What tech stack competitors use
Wappalyzer Browser extension — see tech stack of any site
Crunchbase Funding history and team growth
LinkedIn Team size, hiring signals (what they hire = roadmap)
Startup Knowledge
Y Combinator Library ycombinator.com/library — best startup resources
Paul Graham Essays paulgraham.com
Lenny's Newsletter lennynewsletter.com — product and growth
First Round Review review.firstround.com
a16z blog a16z.com/blog
Stratechery stratechery.com — tech strategy
7. FROM RESEARCH TO STARTUP
The Path
The biggest arbitrage opportunity available to a CS student:
ACADEMIC INSIGHT → COMMERCIAL PRODUCT
(5-10 years ahead) (funded, built, launched)
Most researchers don’t build. Most founders don’t have deep technical insight. You can have both.
The Pipeline
Step 1: ACADEMIC PROBLEM
"Existing RAG systems hallucinate on long documents"
Step 2: RESEARCH SOLUTION
"Novel chunking + re-ranking reduces hallucination by 40%"
Step 3: MARKET QUESTION
"Who is losing money today because of AI hallucination?"
Step 4: CUSTOMER DISCOVERY
"Legal teams, medical documentation, financial compliance"
Step 5: PRODUCT HYPOTHESIS
"AI document analysis for legal teams, 40% more accurate than [competitor]"
Step 6: VALIDATION
Landing page + 10 interviews + 1 paying design partner
Step 7: BUILD MVP
Smallest version that demonstrates the 40% improvement for legal use case
Step 8: COMPANY
Your Unfair Advantage as a CS Student
Most founders copy ideas from other companies. They compete on execution.
You can compete on insight — something you know deeply that others don’t. That is a moat that cannot be copied fast.
Examples: - Deep knowledge of a specific ML architecture - Understanding of a specific systems problem (database internals, networking, compilers) - Research background in a domain + ability to build
The Research Moat
When your product is built on a genuine research insight: 1. It is hard to copy without understanding the underlying research 2. You understand it more deeply than any competitor who tries to copy 3. The insight usually generalizes — it can become a platform, not just a feature 4. It attracts other researchers as employees and collaborators 5. It attracts academic credibility which becomes marketing
8. WORKING WITH ADVISORS AND MENTORS
The Advisor Relationship
Your research advisor is not your boss and not your peer. They are a senior collaborator who has navigated the system you are entering.
What to Expect From a Good Advisor
- Regular meetings (weekly or biweekly)
- Feedback on written work within 2 weeks
- Introduction to relevant people in the field
- Honest assessment of your work’s quality
- Help positioning your work for publication
- Career guidance
What They Expect From You
- Consistent progress, even if slow
- Proactive communication — they hear about problems from you, not later
- Preparation for every meeting — arrive with concrete update and specific questions
- Ownership of your research — you drive it, not them
- Written documentation of your work — they cannot read your mind
The Meeting Preparation Template
Before every advisor meeting:
1. Written update: what I did since last meeting (1 paragraph)
2. Results: any new data or findings (figures/tables)
3. Blockers: what is preventing progress
4. Decision needed: what do I need their input on specifically
5. Plan: what I intend to do before next meeting
Send this by email 24 hours before the meeting. It makes the meeting 3x more productive.
When Things Go Wrong
If advisor relationship is not working: - Communicate directly first — most issues come from misaligned expectations - Document everything — emails, meeting notes, decisions - Understand the formal process at your institution before escalating - Talk to other students in the lab — you are probably not alone - Changing advisors is possible and sometimes necessary — better early than late
Finding Mentors (Outside Academy and Outside Your Company)
The best mentors are people 5-10 years ahead of where you want to be, who still remember what it was like to be where you are.
Finding Them
LinkedIn → comment thoughtfully on posts, then reach out
Twitter/X → engage with content before DMing
Conferences → show up, be curious, don't pitch — just have conversations
Alumni network → your university's alumni want to help
Office hours → founders at YC, Andreessen etc. run public office hours
Cold email → it works more than you think, if it's genuine and specific
The Cold Outreach That Works
Subject: Quick question from a CS student working on [topic]
Hi [Name],
I'm a CS student at [university] working on [specific problem].
I read your [paper / blog post / tweet] about [specific thing]
and it directly answered a question I'd been stuck on for weeks.
I have one specific question: [the actual question, one sentence]
I know you're busy. A 10-word reply would genuinely help.
Thanks,
[Your name]
Short. Specific. One ask. Demonstrates you’ve done the work.
9. RESEARCH ETHICS
Why This Matters More Than You Think
Unethical research doesn’t just harm your career. It harms the field, harms people who build on your work, and harms the public who relies on published research to be true.
Core Principles
1. Reproducibility
Every result must be reproducible by an independent researcher following your paper. This means: - Publishing your code (GitHub with paper link) - Sharing your data (or explaining why you can’t) - Reporting exact hyperparameters and random seeds - Not hiding failed experiments that change the interpretation
2. Honest Reporting
Report your results honestly, including: - Results that don’t support your hypothesis (negative results) - Conditions where your method fails - Limitations of your approach - Statistical uncertainty (confidence intervals, error bars)
Selective reporting — only sharing your best results while hiding others — is a form of scientific fraud.
3. Attribution and Citation
- Cite every source you build on
- If you use code from someone else, credit it
- Don’t present prior ideas as your own
- If you work with others, authorship should reflect contribution
4. Data Ethics
- If you collect human data: IRB approval, informed consent, privacy
- Don’t train models on data you don’t have rights to use
- Be aware of biases in datasets and report them
- Consider the potential harms of your research output
5. AI-Assisted Writing
Using LLMs in research is still evolving in norms. Current principle: - You are responsible for every word in your paper - AI-generated text you didn’t verify is a risk to your credibility - Disclose AI tool use per your venue’s policy (check before submitting) - Using AI to improve clarity is generally fine; using it to generate results is not
6. Conflict of Interest
Disclose funding sources. Disclose industry relationships. If a funder has interest in your results, say so. Readers need this information to interpret your work.
10. THE WEEKLY RHYTHM
Sustainable research requires rhythm. Bursts don’t produce good work consistently.
Weekly Research Rhythm (Academic)
MONDAY
└── Weekly planning
Review weekly summary from last week
Set ONE main question to answer this week
Set 3 concrete tasks (no more)
Check deadlines: paper submissions, advisor meetings
TUESDAY - THURSDAY
└── Deep work blocks
2-3 hour focused blocks, phone away, notifications off
Alternate: one day experiments, one day writing/reading
Log progress in research journal every evening (10 min)
FRIDAY
└── Review and documentation
Update experiment logs
Write weekly summary
Respond to emails, admin tasks
Brief slack reading
WEEKEND
└── Light only
Read papers without note-taking pressure
Watch talks (YouTube: conference recordings)
Let the brain consolidate
No guilt for not working
Weekly Research Rhythm (Startup)
MONDAY
└── Metrics review + planning
Review last week's numbers: did they move?
Set one growth hypothesis to test this week
Customer interview scheduled?
TUESDAY
└── Build or research day
Deep work on whatever needs building or testing
WEDNESDAY
└── Customer day
Interview or user test session
Respond to user feedback and support tickets
Review analytics
THURSDAY
└── Build day
Implement learnings from Wednesday
FRIDAY
└── Ship + document
Ship whatever is ready
Document what was learned
Update opportunity document
Team sync on week learnings
The Daily Deep Work Protocol
For maximum research productivity:
Block 1 (9am-12pm): Deep work — hardest task first, no meetings
Block 2 (12-1pm): Break, food, walk
Block 3 (1pm-3pm): Reading and lighter work
Block 4 (3pm-5pm): Correspondence, admin, planning
Block 5 (5pm-6pm): Journal entry, tomorrow's plan
The 9-12 block is non-negotiable. Protect it. Every research breakthrough happens in long, uninterrupted sessions.
11. COMMON MISTAKES AND HOW TO AVOID THEM
Academic Research Mistakes
Mistake 1: Starting Without a Clear Question
What happens: You read for months, get lost, produce nothing Fix: Write your research question on a card and tape it to your monitor. If you can’t state it in one sentence, it isn’t ready yet.
Mistake 2: Reading Instead of Doing
What happens: Endless literature review, no experiments Fix: After 2-3 weeks of reading, force yourself to run an experiment — even a bad one. Doing reveals what you need to read. Reading doesn’t tell you what to do.
Mistake 3: Not Version-Controlling Your Work
What happens: Overwrite working code, lose a LaTeX version that was better, can’t reproduce an experiment from 2 months ago Fix: Git from day 1. Commit every time something works.
Mistake 4: Weak Baselines
What happens: Your method looks good but only because you compared it to something obviously worse Fix: Before running experiments, ask: “what would a skeptical reviewer say about my baseline?” Then make it stronger before they do.
Mistake 5: Not Writing Throughout
What happens: You have 6 months of work and no writing. Paper deadline approaches. Panic. Fix: Write something every week. Even 2 paragraphs. The methodology section should be nearly complete before experiments are done. Writing forces clarity on your thinking.
Mistake 6: Perfectionism
What happens: Paper is never good enough, never submitted, never gets feedback Fix: A submitted imperfect paper that gets reviewed and rejected teaches you more than a perfect paper that never leaves your hard drive. Submit.
Mistake 7: Working in Isolation
What happens: Wrong direction for 3 months, advisor meeting reveals it in week 12 Fix: Share work early and often. Partial results, rough drafts, half-baked ideas. Feedback when things are malleable is infinitely more valuable than feedback when they’re done.
Startup Research Mistakes
Mistake 1: Building Before Validating
What happens: 6 months of engineering, 0 customers, pivot or die Fix: 20 customer interviews before any code. Non-negotiable.
Mistake 2: The Problem Is Real But Not Urgent
What happens: People say “yeah that’s a problem” but don’t pay Fix: Test urgency: “If I could give this to you today, would you use it?” If they don’t drop everything to say yes — it’s not urgent enough.
Mistake 3: Talking to Friendly People
What happens: Only interview people who know you and want to be supportive, get false positives Fix: Find strangers in your target market. They have no reason to lie.
Mistake 4: Building for Yourself When Your Market Is Someone Else
What happens: Great product for a CS student, no other CS students pay for B2B tools Fix: Be your own customer only if you are truly representative of the market. Otherwise, your intuition is actively misleading.
Mistake 5: Ignoring Churn
What happens: Good acquisition, 80% of users leave after week 1, focus only on getting more users Fix: Retention is the #1 metric. Acquire 10 users, make them love it, then scale. Not the other way.
Mistake 6: Pricing Too Low
What happens: Lots of users, no revenue, unsustainable Fix: Charge more than you think you should. The discomfort you feel is not evidence the price is wrong.
Mistake 7: Market Research Without Customer Discovery
What happens: Market “looks big” on paper, real customers have different problem than assumed Fix: Data tells you what. Conversations tell you why. You need both.
12. RESOURCES LIBRARY
Essential Books
Academic Research
"How to Write and Publish a Scientific Paper" — Robert Day
└── The mechanics of academic writing. Read this first.
"The Craft of Research" — Booth, Colomb, Williams
└── How to think about and structure research arguments.
"Writing Science" — Joshua Schimel
└── Narrative structure in scientific writing. Underrated.
"A PhD Is Not Enough" — Peter Feibelman
└── Career strategy for researchers. Read before your PhD, not after.
"How to Read a Paper" — S. Keshav
└── The 3-pass method. Free PDF. 6 pages. Read it this week.
Startup and Product Research
"The Mom Test" — Rob Fitzpatrick
└── How to get honest information from customers. 2-hour read.
Most important book on customer discovery ever written.
"The Lean Startup" — Eric Ries
└── Build-measure-learn loop. The scientific method for startups.
"Continuous Discovery Habits" — Teresa Torres
└── How to do product research as an ongoing practice.
"Zero to One" — Peter Thiel
└── How to think about building something genuinely new.
"The Hard Thing About Hard Things" — Ben Horowitz
└── What building actually feels like (versus the theory).
"Obviously Awesome" — April Dunford
└── Product positioning. How to make what you built land correctly.
Mental Models and Thinking
"Thinking, Fast and Slow" — Daniel Kahneman
└── How your brain actually works. Essential for critical thinking.
"The Art of Problem Solving" (series)
└── Mathematical thinking. Foundational for CS research.
"Poor Charlie's Almanack" — Charlie Munger
└── Mental models across disciplines. Expensive book, worth it.
Essential Papers and Articles (Free)
"How to Read a Paper" — S. Keshav (2007)
→ scholar.google.com — 6 pages, mandatory
"You and Your Research" — Richard Hamming (1986)
→ YouTube talk or transcript — how great researchers think
"Reflections on Trusting Trust" — Ken Thompson (1984)
→ ACM — Turing Award lecture, systems security, thinking about foundations
"Attention Is All You Need" — Vaswani et al. (2017)
→ ArXiv:1706.03762 — the paper that changed everything in ML
"MapReduce" — Dean & Ghemawat (2004)
→ OSDI 2004 — how to think about large-scale systems
"Do Things That Don't Scale" — Paul Graham
→ paulgraham.com — the most important essay for early startup founders
Courses (Free)
CS Research Methods
├── MIT OpenCourseWare — various CS courses
└── Stanford CS courses on YouTube
Machine Learning
├── fast.ai — practical ML, project-first (free)
├── Andrew Ng's courses (Coursera, can audit free)
└── Andrej Karpathy's Neural Networks: Zero to Hero (YouTube)
Startup
├── Y Combinator Startup School — startupschool.org (free)
├── Stanford CS183 (Peter Thiel) — notes available free
└── Lenny's Podcast — best product thinking conversations
Communities
Academic
├── r/MachineLearning, r/compsci, r/AskComputerScience
├── Papers With Code community
├── HuggingFace forums
└── Your university's research groups (join early)
Startup
├── Hacker News — news.ycombinator.com
├── Indie Hackers — indiehackers.com (bootstrapped startups)
├── r/startups, r/entrepreneur
├── Y Combinator Alumni network (if you do YC)
└── Twitter/X tech community
13. THE MASTER MENTAL MODEL
Everything in this document compresses into one loop:
OBSERVE
└── Notice something broken, missing, or unexplained in the world
"Why does X fail?" / "Why doesn't Y exist?" / "What if Z worked differently?"
↓
QUESTION
└── Form a precise, falsifiable question about it
Not a topic. A question. With a possible wrong answer.
↓
SURVEY
└── Understand everything that already exists
What has been tried? What worked? What failed? What was assumed?
↓
HYPOTHESIZE
└── Propose your answer BEFORE testing
"If I do X, then Y will happen because Z"
Commit to a prediction. This is what keeps you honest.
↓
EXPERIMENT
└── Smallest possible test of your hypothesis
Academic: controlled experiment with baselines and metrics
Startup: interview, landing page, presell, concierge
↓
MEASURE
└── Collect honest data
Numbers that could prove you wrong, not just right.
↓
ANALYZE
└── Understand WHY, not just WHAT
Why did it work? Why did it fail in these cases?
What alternative explanations exist?
↓
COMMUNICATE
└── Tell the world what you found
Academic: paper / talk / blog post
Startup: product / pitch / case study
↓
ITERATE
└── The output of communicating is new observations
New data. New questions. New hypotheses.
The loop restarts, faster and with better questions.
The Deeper Truth
Academic research and startup product research are not two different disciplines wearing the same name.
They are one discipline operating at different frequencies.
The academic researcher iterates over months. The founder iterates over weeks. The great technologist does both simultaneously — publishing the insight and building the product.
Academic cycle: Problem → Research → Paper → Field advances
Startup cycle: Problem → Research → Product → Market advances
Combined: Problem → Research → Paper + Product → World advances
You are a CS student. You have been handed the rare gift of time and structure to develop the research mindset before the market pressure sets in.
Use this time to develop the habit of asking precise questions, of testing rather than assuming, of being honest about what you don’t know.
That habit — not the specific tools, not the frameworks, not the templates — is what makes a great researcher and a great founder.
The tools change. The templates get updated. The habit of rigorous curiosity compounds forever.
Document Version: 1.0 Last Updated: 2026 Total Length: ~12,000 words Sections: 13 major sections, 50+ subsections This is a living document — update it as you learn.