sub-title

Also check Orama's Quora and Orama's GitHub
I shall not claim to know so much, but only that I learn new things everyday

Friday, 22 April 2022

Data Analytics: Why I chose Python instead of R

Introduction

When you start talking about data analytics (or even broadly data science), then you may automatically find yourself talking about Python and R. There are benefits in having to try both languages. I have been there, and that experience helped me in making my choice of which of the two to use for data analytics.

At the end of a thoughtful exercise years ago, I chose to use Python for data analytics. I had earlier used R for the same, so making a comparative analysis was easy. I want to share the main factors that influenced my choice of Python.

First, it is important to note that both Python and R are open source, with diverse communities, extensive libraries and tools, and not too hard to learn and use. But they also have some important differences.

According to Wikipedia, R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Therefore, statistics and graphics are the core of R.

Python, on the other hand, is a general-purpose language for doing just about anything that is programmable. It uses the following libraries for data analytics and data science: Numpy (numerical analysis), Pandas (data wrangling/analysis), Scipy (scientific computing), Scikit-learn (machine learning), Seaborn (visualization).


Factors that influenced my choice of Python for Data Analytics


First off, a caveat: this is not part of the war of Python against R users. It is just an objective analysis based on my experience. The fact that Python works better for me does not mean that R will not work better for another.

1. Having a programming background, I found Python more attractive because it is more programmable than R. It would be natural to argue that R is mostly preferred by non-programmers while Python fits better with programmers.

On the flip side, it is very easy to install and start using R even for the non-tech-savvy. You install, issue one or two commands, and you have your output. Not so much with Python, especially for one who doesn’t have much programming experience. Installation and setting the environment might be a turn off for the non-programmer.

2. Integration and deployment is seamless with Python. With my existing or new web application, I can easily integrate and eventually deploy data analytics API or functionality; and even have the output within a web application. For R, the Shiny package is able to do some integration. R applications can now be used directly and interactively on the web using Shiny.

3. Code re-usability is more feasible in Python because of its programmable nature. This isn’t so much the case with R.

4. With Python, you shoot many birds with one stone. Python is a general-purpose language, while R is a statistical language. By learning and using Python, you open up the scope of applications that you can implement, notably web-applications and automation. With Python knowledge, one is able to explore a wide range of possibilities. This makes Python knowledge more rewarding compared to R because of the vast use-cases of Python.
 
5. Python can use a diversity of direct data sources: Excel, CSV, text files, database, web pages, etc. On the other hand, R is rather limited to Excel, CSV, text files, and data formats of other statistical packages such as SPSS. However, one can use ODBC to access databases directly in R.


Conclusion


Having said the above, it primarily remains a personal choice when selecting which of the two languages to use for data analytics. The environment in which you work also matters – for example, if you find that your teammates are already using R or Python, you may only have to follow suit.

No comments:

Post a Comment