Turn Claude into your data analyst. These prompts help you explore datasets, build analyses, generate insights, and present findings clearly.
I have a dataset with these columns: [COLUMNS WITH TYPES]. There are [N] rows. Give me an exploration plan: what summary statistics to compute, what distributions to check, what correlations to look for, and what data quality issues to watch for. Then write the [Python/R/SQL] code.
Why this works: An exploration plan prevents aimless data poking and ensures systematic coverage.
This dataset has these quality issues: [ISSUES - missing values, duplicates, inconsistent formats, outliers]. Write a data cleaning pipeline in [LANGUAGE] that handles each issue. For each step, explain the decision (e.g., why impute vs. drop) and any assumptions.
Why this works: Explaining decisions documents your data cleaning choices for reproducibility.
Write a SQL query for [DATABASE TYPE] that answers: [QUESTION]. Schema: [TABLES AND COLUMNS]. Requirements: handle NULLs, use appropriate joins, optimize for performance on a table with [N] rows, and include comments explaining the logic.
Why this works: Specifying table size helps Claude choose the right optimization strategy.
I ran an A/B test. Group A: [N1] users, [METRIC1]. Group B: [N2] users, [METRIC2]. Determine: 1) is this statistically significant at p < 0.05, 2) what's the effect size, 3) do I have enough power, and 4) should I make a decision or keep running? Show the math.
Why this works: Including power analysis and the decision recommendation prevents premature test calls.
I want to visualize [DATA DESCRIPTION] to show [INSIGHT/STORY]. Suggest: the best chart type and why, axis labels and title, color scheme considerations, and any annotations that would strengthen the message. Then write the code in [matplotlib/d3/plotly].
Why this works: Starting with the insight ensures the visualization serves communication, not just display.
Summarize this research paper/abstract: [TEXT]. Include: the research question, methodology, key findings, limitations the authors acknowledge, limitations they don't acknowledge, and practical implications. Write for someone who needs to decide whether to read the full paper.
Why this works: Including unacknowledged limitations shows critical analysis, not just summarization.
I have time series data for [METRIC] over [PERIOD]. Data: [DESCRIPTION OR SAMPLE]. Build a forecasting model in [LANGUAGE]. Include: trend analysis, seasonality detection, model selection rationale, confidence intervals, and a clear statement of what would invalidate the forecast.
Why this works: Including invalidation conditions prevents over-reliance on any single forecast.
Design a dashboard for [AUDIENCE] to monitor [DOMAIN]. Include: 5-7 key metrics with definitions and calculations, layout recommendation, filter/drill-down capabilities, alert thresholds, and data refresh frequency. The audience's main decision: [DECISION].
Why this works: Anchoring on the decision ensures every metric on the dashboard is actionable.
Set up a cohort analysis for [PRODUCT/SERVICE]. Cohort definition: [HOW TO GROUP USERS]. Metric to track: [METRIC]. Write the [SQL/Python] code, explain the methodology, and describe what healthy vs. concerning patterns look like in the output.
Why this works: Defining healthy vs. concerning patterns helps you interpret results without being a cohort analysis expert.
Design a survey to understand [RESEARCH QUESTION]. Include: 12-15 questions mixing quantitative and qualitative, response options for scaled questions, skip logic rules, estimated completion time (target: under 5 minutes), and a plan for analyzing the results.
Why this works: The 5-minute target and analysis plan ensure the survey is both completable and useful.
I want to understand what drives [DEPENDENT VARIABLE]. Potential factors: [INDEPENDENT VARIABLES]. Data: [DESCRIPTION]. Walk me through: which regression type to use, how to check assumptions, how to interpret coefficients, and what the results mean in plain English. Write the code in [LANGUAGE].
Why this works: Plain English interpretation is crucial — regression coefficients are meaningless without context.
Design a data pipeline that: ingests [DATA SOURCES], transforms [TRANSFORMATIONS], and loads into [DESTINATION]. Requirements: [FREQUENCY], [VOLUME], [LATENCY]. Include: tool recommendations, error handling strategy, monitoring/alerting, and estimated costs.
Why this works: Including cost estimates prevents designing a pipeline that's technically excellent but economically unreasonable.
Set up anomaly detection for [METRIC/SYSTEM]. Normal behavior: [DESCRIPTION]. Write code that: detects anomalies in real-time, distinguishes between noise and real anomalies, minimizes false positives, and generates actionable alerts. Use [LANGUAGE/TOOL].
Why this works: Minimizing false positives is critical — too many false alerts and the system gets ignored.
Analyze this market data: [DATA]. Identify: market size and growth rate, key segments, competitive landscape, unmet needs, and emerging trends. Present findings as an executive summary (1 page) and a detailed analysis (3-5 pages). Audience: [STAKEHOLDER].
Why this works: Dual format (executive summary + detail) serves both quick decisions and deep dives.
Write a report presenting these findings: [FINDINGS/DATA]. Audience: [WHO]. Structure: executive summary, methodology, key findings (3-5 with supporting data), implications, and recommended actions. Use data to support every claim. Flag where the data is limited.
Why this works: Flagging data limitations builds credibility and prevents over-confident recommendations.