Some may know that I recently got a second job as an adjunct professor in the University of Washington system. While the first course I will be teaching is on Operations and Project Management, which is part of the supply chain discipline of the School of Business, I have been encouraged to work on an emphasis of data analytics and visualization. Last month, I was invited to give a guest lecture to a class on visualization. Developing those slides gave me inspiration to consider developing a curriculum around analytics and visualization. To organize my thoughts, I decided to begin here on WordPress.
It seems to me a class should be organized around the steps of developing data to answer questions. The following list could be used for academic or industrial purposes. I don’t think this list is original, but it seems logical and fits with my own approach at work.
- Have a question without an answer
- Find a data source that may lead to answer
- Look for incompleteness, inconsistencies, and possible errors
- Perform initial analysis and construct initial visuals
- Write a story describing the analysis and visuals and include a conclusion
- Has the initial question been answered? Are there new questions?
I have encountered instances where only steps 2, 4, and 5 are performed. That has resulted in dissatisfaction from the requestor and the analyst. It seems almost all of the dissatisfaction arises from missing step one. If you don’t know what you would like to solve, you don’t know when you have finished. It is the finish that is determined in step six. Sometimes an answer to a question brings up new questions since the conclusion may not be what was expected or the answer doesn’t perfectly answer the initial question.
I have also seen instances where step three is skipped. This can be fatal to any analysis since it can lead to incorrect conclusions or worse, no conclusion when there should have been one. There generally isn’t a way to easily verify the accuracy of a data set. Incompleteness can be easy to check. My background has been in time series analysis which may be the easiest of all types of data sets to verify completeness. My best advice is to spend time with the data and use statistics along with graphs to see if the data looks reasonable. Odd situations can be observed with a basic approach that can lead to questions regarding events that influenced the data.
One of my goals will be to also examine the graphs I have constructed and maintained on the data site. I have some opinions on the construction of charts and how to do it well. With regard to software, I have been using Microsoft Excel for twenty years and still consider it to be the best general purpose software for data analysis. For this blog though, I have turned towards Tableau Public for visualization. I used a recent entry to discuss how that software has made it simpler for me to keep the visuals of this blog updated.
Okay, that is a simple introduction. No time table on what I put in this category exists, but this will keep my imagination active for awhile.