Data analysis is the process of looking at and summarizing data with the intent to extract useful information, make inferences, and develop conclusions. Using statistical or numerical software applications, data analysis can be pursued using a range of techniques, including statistics.
Note that "data analysis" assumes different aspects, and possibly different names, in different fields.
The first activities relate to diagram above and the embedding of data analysis into decision making processes.
(Decision Making) Explain why data analysis is relevant for evidence based decision making.
(Use Case) Look at the diagram above and look at your field of expertise. Populate the different steps with a workflow with raw data that you have access to.
Spatial Decision Support Systems (SDSS) bridge the domain of data analysis and decision support and dealing with spatial data. Look at the domain of Transportation. Create a workflow for the analysis of spatial data for transported goods together with the trucks, ships, trains, planes, ... and identify sustainable ways of transport goods and services to a customer.
address the data analysis for using the capacities of trucks and trains, so that driving without cargo or minimal usage of the capacity is reduced. Identify indicators in the data analysis to address specific Sustainable Development Goals. Identify which SDGs are addressed and how the definition of SDGs determine the used methodologies for the data analysis. Describe the data analysis foundations that are required to measure the impact of intervention for a sustainable system of transport and delivery.
(Machine Learning) Data can be processed with machine learning. Compare methodologies of classical statistical analysis and machine learning as one way of performing data analysis. What are similarities, differences, benefits and drawbacks between those approaches?
Consider a specific learning environment in your domain. How would a teacher select appropriate learning tasks tailored for the student in a way, that the exercise is challenging enough and too complex? What are the indicators (required information) for the teacher to the exercise or the support the teacher provides appropriate to the specific requirements and constraints of the student/learner?
Now we transfer that to data analysis (in this case learner analytics. Identify data that can be collected in a digital learning environment, that could be used to support the teacher in providing tailored teaching and learning material to the student?
Choose from your current knowledge about data analysis an appropriate methodology to analyze the collected data. Start from very basis methodologies of
The following Wiki2Reveal presentations can be used by lecturers as Open Educational Resources to support their course work in addition to standard statistical and numerical approaches to process and analyze data.
(1) Identify an application scenario for which you want to apply your data analysis. Write a small summary of your project (e.g. a Bachelor, Master, PhD thesis).
(2) Describe the experimental design in which the data will be collected.
(3) Provide one scenario,
(3.1) in which you have a fixed time for data collection and after data collection the data analysis starts and
(3.2) in which you get a constant input stream of data that has to be processed in a continuous way with an appropriate methodology for dynamic reporting and dynamic data analysis in real time scenario
in Bachelor, Master, PhD thesis you will have mainly scenario (3.1). In this case it is just an exercise to extrapolate from (3.1) to a scenario (3.2) that handles a constant input stream of data for a dynamic analysis.
(4) Swarm Intelligence compare the data analysis workflow in the diagram mentioned above. For swarms data is coming in not to a central swarm container and it is analyzed centralized. Individuals in a swarm perceive different information/data and the swarm responds to the perceived to data as a group. Identify analogies and differences in data analysis on a qualitative level.
Chapter 2 - Data Clean Up - Processing of Raw Data
Missing Data and incomplete data sets / records and how to impute missing in a way that it does not have an impact on the mean and standard deviation of the data set.
Look at data coming from a stock exchange for a specific share. Explain the benefit of preprocessing of the data with a Moving Average. What is noise in the data and how does Moving Average contribute to reduce noise in the data.
Missing data can be a challenge for the researcher, because incomplete data sets might not be used in the data analysis. Explain circumstances in which imputation of missing data could help to incorporate more data in the data analysis and what are the requirements and constraints for data imputation.