The first step in the project is finding a dataset you would like to explore. This project will require you to take a dataset(s) and analyze them using the tools you get in class. You will be expected to graph your data and analyze it using inferential statistics (i.e. T-Tests, ANOVAs, Correlations, Etc). You are not expected to do all these analyses, but you will need to demonstrate an understanding of why the analyses you pick should be used and how to interpret the results.
FIRST STEP: Find some data. I want your data to be on a topic that interests you. I have put some sources of data in the folder below. It may take a while to find data that works for you. This is not an easy task. The data sets need at least 30 cases, with 4 variables in 3 groups. For example: Obtain data from 3 lakes for pH, DO, N, P over 30 sampling events. Because there is a difference between health data and environmental data, there can be some variation. For example: Lead poisoning rates over 30 years in 5 different age groups based on location/social economic level/etc.
Other examples
Compare air quality over 3 cities over 30 years for 4 different pollutants.
Analyze data from three health studies comparing different cold medicines for recovery time and symptoms.
Analyze the demographic change of cities/states/nations for different variables (age, race, economic level, education level)
In the end you will deliver a paper with:
Introduction – Background of the data and why this data needs to be analyzed.
Methods – how you found your data, why you picked this data, analyses chosen.
Results – analysis, graphs, etc.
Discussion/conclusion – why your analysis may have given your the results it did, what further analysis you need.
References – I expect you to hit the literature and write a comprehensive introduction and conclusion with supporting peer-review scientific materials.
link of the data: https://healthdata.gov/dataset/covid-19-estimated-patient-impact-and-hospital-capacity-state