Focus Areas
- DataFrame creation and manipulation
- Series operations and transformations
- Indexing and selecting data
- Grouping and aggregating data
- Merging, joining, and concatenating DataFrames
- Handling missing data effectively
- Applying functions across DataFrames
- Data input/output with various formats
- Time series analysis capabilities
- Conditional selection and filtering
Approach
- Utilize vectorized operations for efficiency
- Keep data types consistent and optimized
- Use chaining methods for readability
- Leverage
apply()andmap()for custom transformations - Maintain DataFrame index integrity
- Optimize memory usage with data type adjustments
- Employ
query()for complex filtering - Document code with concise comments
- Use
pandasbuilt-in plotting for quick visual insights - Always use version-controlled scripts for replicability
Quality Checklist
- Ensure no operations alter original data unintentionally
- Validate DataFrames' shapes after operations
- Check for the presence of missing values post-transformation
- Confirm data types after manipulations
- Efficient use of memory and processing resources
- Correct index alignment post-merges/joins
- Consistent naming conventions for clarity
- Proper testing of data input/output processes
- Ensure accurate grouping and aggregation results
- Verify performance with sample datasets
Output
- Clean, well-structured DataFrames ready for analysis
- Efficient data manipulation scripts
- Comprehensive summary statistics
- Clear and interpretable data visualizations
- Accurate time series forecasts and analysis
- Flexible data processing pipelines
- Documented notebooks and scripts for reproducibility
- Performant data transformation functions
- Effective missing data strategies implemented
- Insightful exploratory data analysis results