Written by Arthur Wu, iXperience '16 Data Science TA, UVA
We’re either in a data nirvana or data cataclysm, depending on who you ask. If you ask a data scientist, you might find yourself in the former camp. Employing data scientists has exploded in popularity among businesses for good reason. Data science, as a field, has not evolved just overnight. There have been so many critical developments across software engineering, internet, data analysis, statistics, online education, and computing that have led to the widespread democratization of data science. Here, I mean democratization of both data science education and also data science applications.
On the educational front, hundreds of online courses ranging from free, online programs offered by Stanford and MIT all the way to education startups like Datacamp present a huge range of options to the budding data scientist. Platforms like StackExchange and GitHub allow developers to put together resources and tutorials at a huge volume. And this information really matters. Ever wanted to analyse the effect of Uber on the transport industry in New York? Github has the data to let you do that.
From a computing perspective, the onset of big data wrangling technologies and distributed computing has enabled even the most powerful and time-intensive algorithms to run on regular computers (you’ll just need a lot of them). Anyone can now access and use some of these insanely powerful algorithms with just a few lines of code. Earlier this year, Google released TensorFlow, a publicly available library that Google engineers “really do use” to deploy rich artificial intelligence in its products and services. Other developers in the R community have put together publicly available libraries that let its users wield cutting-edge algorithms. As an example, iXperience Data Science students can run neural nets for prediction challenges. This means that learners are now creating models that simulate the human brain's thought processes when making a decision!
So here are the fundamental trends: data science is becoming far easier to learn and far more powerful as a field than before. This has all occurred within the time span of a decade, if not less.
However data science is objectively pretty hard to learn. It’s not just statistics. As Josh Wills once said, a data scientist is a “person who is better at statistics than any software engineer and better at software engineering than any statistician”. Going further, data scientists must deeply understand their specific problem domain, multidisciplinary business objectives, and storytelling.
No big deal right?