A book to learn R and Python in parallel for Data Science

(github.com)

286 points | by zelda_1 1835 days ago

14 comments

  • mh12345 1835 days ago
    R has a nice web development framework called Shiny. While it is not comparable to say Django or Flask, Shiny does make it incredibly easy to share data analysis. If one wants to share statistical analysis or create a data oriented dashboard, then there is definitely a reason to consider R and Shiny. Note that Python has Dash, which is comparable to Shiny, but it is less mature as far as I know.

    While previously Shiny was primarily deployed through RStudio's solutions, there are now open source initiatives such as ShinyProxy, introducing Kubernetes as an option for deploying Shiny applications. The latest iterations of Shiny related libraries are facilitating automated testing and deployment. These developments allow companies to use Shiny in production, but it has to be said that the R ecosystem is not as developed as Python's from a traditional software development perspective.

    • eoinmurray92 1835 days ago
      Dash by plotly is also amazing its like shiny but for python! - we were able to whip together an app that would let you drag and drop xyyy data and get a scatter plot instantly - you can try it here (first load takes 1-2s):

      https://dash-app-dx9g2r0la6-8000.cloud.kyso.io

      It was also really easy to make it, maybe 250 lines of python in total

      (guide to making this app is here: https://kyso.io/KyleOS/creating-an-interactive-application-u...)

    • yboris 1835 days ago
      I learned some R just so I could try out Shiny earlier -- Shiny is pretty awesome!

      By Dash for Python, you mean the one from Plotly? https://plot.ly/products/dash/

      Thank you for sharing ShinyProxy!

      • mh12345 1835 days ago
        Indeed, the one from Plotly! I gave Dash a quick shot about a year ago, it worked quite well to generate interactive reports.

        ShinyProxy is amazing. It is pretty easy to setup, but does require quite some specialized knowledge compared to the RStudio solutions.

      • mettamage 1835 days ago
        This is one of the most fun comments I have read in a while: learn a bit of a language to check out the web dev framework behind it.

        Awesome!

    • Annatar 1835 days ago
      I maintain R and Shiny where I work. With Shiny, one can now build any web application imaginable, even a completely generic one.

      R has such a huge library of software now, that it has gone far and wide outside of statistics and analytics - any kind of application can be built in R now.

      In fact I see no point in using Python for mathematics or number crunching any more as R has it all and performance critical parts can be rewritten in modern Fortran for very high speed and made available inside of R transparently.

      • psychometry 1835 days ago
        Sorry to be so blunt, but if you think R is comparable with Python on the web app side, you haven't built any moderately complex web apps. Shiny is fine for single-page reactive apps, but it's not a generic framework in the way that Django or Flask is.
        • Annatar 1835 days ago
          I hate web frameworks, absolutely despise those horrible, over-complicated, bloated monstrosities. If writing for "Flask" or "Django" is considered an advantage, then I'm deeply glad I didn't fall for that garbage.
          • psychometry 1835 days ago
            I suppose everyone using a web framework is just a rube, and that you alone have a monopoly on wisdom? Well, I take back my apology. Ignorance and arrogance together are not a good combination, but you possess both in spades.

            As you gain experience building applications, at some point you'll learn that you were wrong. Or you won't, in which case I feel sorry for whoever inherits your reinvented-wheel codebase.

            • Annatar 1834 days ago
              I won't, because I've been building applications for over 30 years; I know what's best, as I've had ample time to find out what works and what is sheer idiocy.

              The various software I've written is small, fast, light, with minimal dependencies and it's easy to install because I deliver it as OS packages for the operating systems I support; my users tell me they are happy. The memory requirements are miniscule and the software lightning fast. Its size is measured in kilobytes, not megabytes or gigabytes, which means I must be doing something right. The manual pages often exceed the software in size and are brimming with examples. I pay extra attention to being backwards compatible when I implement changes and enhancements. Regressions are non-existent.

              So I know I'm right and that using a bloated "framework" would have been one of the stupidest things I could have ever done.

              As for re-inventing wheels, I use what comes with the OS and leverage what's already there; I've purposely not implemented any algorithm re-implementations of my own, although I easily could have. I'm neither dumb nor stupid to go re-inventing wheels, in fact that's one of the reasons why I hate webshits' frameworks. They don't call them webshit for no reason.

    • samt430 1835 days ago
      Shiny is fantastic! (Especially paired up with RStudio Connect)
  • billfruit 1835 days ago
    I sometimes wonder is there any reason to learn R at all, since python eco system has absorbed most of its advanced statistical functionality, coupled with the factor that python environment is much more general, with capabilities to fetch, decode/encoded data,work with binary data databases, web frameworks for presenting etc.
    • minimaxir 1835 days ago
      I use both Python and R. tidyverse/ggplot2 alone are enough reason to use R, and are substantially faster for tasks that utilize those packages than the equivalent in Python (in my opinion).

      Although I haven't had as much reason to use base R. For more ML-related tasks I do go back to Python.

      • roenxi 1835 days ago
        Here here. Tidyverse also provides a centralised 'this is how you do X' nexus really helps discover-ability. World class stuff, on tap.

        For example, I know the recommended pipe in R is magrittr's %>%. I have no idea what the respectable pipe library in Python is, or even if there is one.

        I wouldn't even know where to start finding all the tidyverse equivalents in Python. It isn't as organised and obvious as the R statistics community.

        On the other hand Base R is the worst. Disgusting language.

        • spectramax 1835 days ago
          Julia has a |> operator and it works amazingly well with Queryverse.jl which is a clone of Tidyverse!
      • jwilbs 1835 days ago
        This. I’ve contributed code to popular libraries in both languages, and while I (overall) have a preference for python (mostly due to it being general purpose), I find R code unparalleled when it comes to raw data manipulation/analysis.

        The overall api of tidyverse packages is such a joy, and recent improvements in purrr/tidyr allow me to construct nested data analysis workflows I couldn’t even dream of in python.

        • ppod 1835 days ago
          One random example I found recently is a tidyverse package called forcats that has lots of nice functions for categorical data. For example, it has a single function that merges all categories with a frequency of less than a certain threshold in the table into a new category like "other" or whatever. This is a task I often need to do, but as far as I can see it's a bit of a hack in python or pandas. It's just lots of little things like this, especially wrangling data tables.

          https://forcats.tidyverse.org/reference/fct_lump.html

          There's also the data.table package for this kind of data work, which is maybe less used but seems to have better performance.

        • marmaduke 1835 days ago
          Would you have an example of that?
      • logjammin 1835 days ago
        Seconded on all points. I do branch out to SQL for stuff, too, and I find that R and Python play nicely with it, too. But as long as ggplot exists and Python doesn't have it, R will never really leave my side.
      • IanCal 1835 days ago
        I'm finding a fairly nice combo is using rmarkdown, python and reticulate to do the things that are easier in python there and the outputs in R. Debugging isn't where I'd like it to be yet but there might be a way of improving that - I haven't explored yet.
    • anthony_doan 1835 days ago
      > since python eco system has absorbed most of its advanced statistical functionality

      This isn't true at all...

      Also all advance statistical books are either SAS or R. If it's R then there is always a package that the author created.

      Just look at Chapman & Hall/CRC or Springer publisher and look at their books.

      Go here: https://www.jstatsoft.org/index

      Count the number of R packages in those papers versus Python.

      I don't even need a source. I'm a statistician and I'm going to get a paper there and publish a R package for my master thesis.

    • mikorym 1835 days ago
      I use both python and R almost every day.

      Although I like R and often use R to quickly order tabulated data, there are a few things to take into account that in recent times are building a strong case for me not to use R habitually.

      Development in R is frustrating. If you don't need to do dev, then on this point you are home free. Testing things that you deploy in R is not simple.

      Scripting in R can be frustrating. I have a script that traverses Excel files and using tryCatch() is just so much more complicated with it being a function. In Python the try-catch functionality is part of the design syntax.

      There are scenarios where R is better. If you are in actuarial science, research or academics then often you'll find R libraries that just work.

      R treats tabular data with grace. Everything in R is an array.

      The takeaway for me is that I should use R less and Python more. I personally can't deal with something like tryCatch() being overcomplicated, but for people who don't do dev anyway and maybe need to analyse DNA sequences for a living, R can be rewarding. For me: the ggplot2 library is great; stay away from Shiny and dev in R.

      • mh12345 1835 days ago
        Interesting, why do you advise people to stay away from Shiny?
        • mikorym 1835 days ago
          It tries to do html, but it is limited. So I'd rather use Javascript to manipulate the frontend directly.

          It tries to do functional programming, but the documentation is not satisfying. The responses and behaviour is perplexing.

          I spent around 5—10 hours trying to get a Shiny GUI to work and eventually got to the conclusion that 1) if you want a big project do all the frontend stuff in something else, like JS and 2) if you want a small project try something established (I am not advocating, it's just an example) like Power BI.

          • mh12345 1835 days ago
            Regarding the limited frontend capabilities, I had a similar opinion at one point, but with Shiny's HTML templating (https://shiny.rstudio.com/articles/templates.html) functionality one can circumvent the limited HTML that Shiny has out of the box. Besides that, there is also the possibility to communicate with R using JavaScript (https://shiny.rstudio.com/articles/communicating-with-js.htm...). These two functionalities combined allow for a frontend that is much more flexible, when compared to traditional Shiny applications. Of course, there might definitely be better solutions out there that fit your use case and Shiny's real use is primarily in sharing data analysis.
    • jhbadger 1835 days ago
      Not really in my experience. Really, the only place where I'd say Python has gotten more support so far than R is in deep learning. If you want any just-published statistical method, the associated implementation will almost inevitably be in R. But that's today -- I'm old enough to remember when the standard language in "The Journal of Statistical Software" was XLISP-STAT (much of the 1990s).
    • Canadauni 1835 days ago
      I use mixed effects models pretty extensively. While there is an implementation in statsmodels the implementation in lme4 is more user friendly and has a more mature ecosystem of post-hoc tests.
    • kuzehanka 1835 days ago
      I don't think there's any reason to learn R for anyone who is already proficient at programming. Despite being proficient with R, the only times I used it in the last two years were for ggplot. And even for data vis, I'm increasingly using Python and JS.

      There's a bunch of comments below which can be summed up with 'use R because <package name> doesn't have a direct python equivalent' but they're all missing the point that the Python data science ecosystem is evolving at a much faster pace than R and will completely supersede it in a few years.

      R, like SAS, is a tool for non-programmers. And there it shall remain. The only demographic where R makes sense long term are pure mathematicians/statisticians who are not proficient in programming. But that demographic is rapidly declining in size.

      • anthony_doan 1835 days ago
        > There's a bunch of comments below which can be summed up with 'use R because <package name> doesn't have a direct python equivalent' but they're all missing the point that the Python data science ecosystem is evolving at a much faster pace than R and will completely supersede it in a few years.

        The point is R is a very good language for statistic because of the packages not data science. Data science can do their own thing it's okay. It's also okay for data science to use statistic models from statistic too.

        > R, like SAS, is a tool for non-programmers.

        I respect and love data science and machine learning but this behavior of generalization is terrible. There are many wondeful programmers contribute to R and uses R as I am sure there are many wonderful statisticians that use Python. They're just tools.

        > And there it shall remain. The only demographic where R makes sense long term are pure mathematicians/statisticians who are not proficient in programming. But that demographic is rapidly declining in size.

        What is up with these generalizations? R is not going anywhere in the statistic community. It's doing fine. Also from my experiences in academia most math people use matlab and if any R.

        It's okay to have both R and Python doing their thing.

        There is no need to conflate data science and statistic or have this weird tribalism.

        • kuzehanka 1835 days ago
          Everything you said sums up with 'R is a very good language for statistic because of the packages' which is pretty much in agreement with the GP comment.

          R has nothing going for it except a rapidly dwindling number of packages that don't yet have a direct python equivalent. It doesn't make sense to invest time into R if one already knows python unless one specifically focusing on academia pure stats type stuff.

          Even then, the incoming generation of undergrads are increasingly proficient with programming and are shying away from R the same way that they shied away from Matlab after scipy matched it for 95% of their tasks.

          • anthony_doan 1834 days ago
            > R has nothing going for it except a rapidly dwindling number of packages that don't yet have a direct python equivalent.

            This is not a true statement.

            Here are the data that goes against this statement.

            1. https://www.r-bloggers.com/on-the-growth-of-cran-packages/ 2. https://blog.revolutionanalytics.com/2017/01/cran-10000.html 3. https://www.r-bloggers.com/rs-remarkable-growth/

            From 2015 to 2016: ~6,200 to More than 8,000 in April, 2016

            From 2016 to 2017: CRAN now has 10,000 R packages.

            > Even then, the incoming generation of undergrads are increasingly proficient with programming and are shying away from R the same way that they shied away from Matlab after scipy matched it for 95% of their tasks.

            This is a generalization.

            So far you've made opinionated negative generalization with no data.

            Python is great because it learn from Matlab and took many great ideas and inspirations from Matlab. But I'm not going to make sweeping negative statements about Matlab or pretend to know how it going when I don't have enough data or experiences in it.

          • Annatar 1835 days ago
            "It doesn't make sense to invest time into R if one already knows python unless one specifically focusing on academia pure stats type stuff."

            Ha! I knew it! So it is familiarity with Python then!

            R does have something else going for it: phenomenal documentation and consistency. Replicating R's thousands of available libraries will be a gargantuan effort. It is cheaper and more efficient to master R.

      • snackematician 1835 days ago
        Tidyverse is not just some "<package name>" -- it's an entire workflow, centered around functional programming and tidy data (https://vita.had.co.nz/papers/tidy-data.pdf), and nothing in Python comes close. R has many warts, but its lisp roots and metaprogramming strengths have allowed the tidyverse devs, and other excellent programmers working with R, to dramatically improve the language, and spawn a whole new style of statistical programming.
        • kuzehanka 1835 days ago
          Can you elaborate on what tidyverse offers you that the python ecosystem doesn't? 'Nothing comes close' is a couple degrees too strong a statement from my experience with R, but maybe you know something I don't.
          • snackematician 1834 days ago
            Tidyverse offers a programming style based around piping dataframes through a chain of endomorphisms ("verbs"). Closest things that come to mind are SQL and d3. Pandas feels clumsy by comparison.
            • kuzehanka 1834 days ago
              Uhhh but pandas is literally a chained architecture? Have you actually used it?
              • snackematician 1834 days ago
                I have used pandas extensively, it was my main statistics environment for a couple years before I switched back to R for tidyverse. At the time chaining was not well supported or idiomatic; multi-indexing was all the rage.

                I still occasionally use pandas with seaborn when it's not worth it to switch out to R. I don't think it can match the tidyverse+ggplot combo for quickly exploring and making beautiful plots. But this discussion has inspired me to do some googling and it seems like some people are using tidyverse-like workflows in pandas (https://stmorse.github.io/journal/tidyverse-style-pandas.htm...). Doesn't seem quite as smooth but I'll definitely be trying it out next time I'm working in pandas.

                • piccolbo 1832 days ago
                  I've used both and have two additional comments.

                  Some of the dplyr elegance comes from the flexible evaluation mechanism in R, whereby mutate(data, col1+col2) works because the second arg is evaluated in an enriched environment. Python eschews this kind of macro-like extensions because, my guess, tampering with evaluation makes a lot of other things complicated (for instance, forget replacing args with their value, that doesn't work anymore). I think the author of dplyr himself in later work has promoted the use of the ~ operator to explicitly block eval of an argument and at least make these departures from regular eval explicit. That means dplyr is ahead for interactive use, but for programming you have to switch to a separate API (the underscore "verbs") and that makes the transition from interactive work to coding a bit steeper. It's all trade-offs, and I am not saying that I know better than either the pandas or dplyr authors.

                  As to ggplot, if you believe the future of statistical graphics is in-browser and interactive, you should take a look at altair for python (I myself created a small extension to it called altair_recipes). It's based on vega, like ggplot anointed (but not quite ready) successor ggvis and uses the grammar of graphics (or on interpretation thereof) like ggplot, with extensions to interaction. Simpler than D3 by most accounts.

      • Annatar 1835 days ago
        I cannot understand why I would use Python over R. R is designed from the ground up for massive amounts of data processing at speed and with ease. Even if Python continues accreting computational functionality, it will never be as fast or as efficient as R. Improving Python for something R is designed to do seems to me to be a huge waste of time: familiarity should not be the driving force behind replicating R's functionality. That's just so wrong.
        • kuzehanka 1835 days ago
          > R is designed from the ground up for massive amounts of data processing at speed

          What? The R ecosystem doesn't provide meaningful out of core capabilities, nevermind the ability to handle anything approaching 'massive amounts of data'.

          -- Would sure love to know why an agenda-less factual comment is getting downvoted.

          • javierluraschi 1835 days ago
            In my experience, R is really fast since I t was designed to store data in columnar format which we now all know is best for data analysis. So, in most cases, scaling up computation is quite easy. To scale out, you can use Apache Spark with R, the interface I’ve worked on, sparklyr is quite easy to use and allows you to scale out computation. Just to give you an example of what’s possible, I was playing around yesterday with a ray tracing prototype someone is building and scaled it out in Spark, see https://twitter.com/javierluraschi/status/112055769372135424... — it’s a misconception that R is slow or can’t scale.
            • kuzehanka 1835 days ago
              You can plug any compute kernel you want into spark, that's not a pro or con of R.

              Column stores are standard in any analytics pipeline today. They make up Python's Pandas, R's dplyr, and Java's DataFrame. How or why does R stand out for 'massive amounts of data'?

              R does not have have meaningful out of core compute offerings that compare with something like Dask.

              R does not at all have cluster compute offerings that compare to Dask Distributed.

              If you want to know what real performance looks like, check out Python's cudf which will shortly fully match the Pandas api. That raytracing example you linked would run at interactive rates with cudf, I really don't see any basis for perf arguments in R's favour, and 'massive data' arguments are laughable here.

              Whatever advantages R has, perf or scalability are definitely not amongst them.

              • Annatar 1835 days ago
                You are arguing for Python and speed in the same breath? If you want portable speed, you better "warm up a chair" and master Fortran.

                Bonus: modern Fortran is a joy to develop in, far more fun than Python. And you get to compile to machine code, either for a processor or a GPU.

              • tylermw 1835 days ago
                > That raytracing example you linked would run at interactive rates with cudf, I really don't see any basis for perf arguments in R's favour, and 'massive data' arguments are laughable here.

                I don't see how the "GPU DataFrames" provided in cuDF would enhance a raytracer in any way.

    • j7ake 1835 days ago
      Is there anything comparable to tidyverse and ggplot2 in python? If so I will switch immediately.
      • groceryheist 1835 days ago
        To answer your question:

        ggplot2 : plotnine is quite good ggplot2 clone based on matplotlib. I feel like ggplot2 is a bit better and more complete, but if you want to do something that isn't supported it's harder for me to hack than matplotlib.

        Tidyverse: To me, ggplot2 is the only essential part of the tidyverse. Lubridate is also good. Most others seem like semantics and syntax sugar. I prefer data.table, which is similar to Pandas. DT is super fast but imho Pandas has a more intuitive and consistent API (and if you want a speed up for large N then dask might work).

        I use both R and Python on a regular basis. I choose Python for lower-level stuff, automation, parallelism / concurrency, and R for bespoke statistics. I use both for everyday statistics and plotting, but I feel that R has light advantages. I feel like if you're comfortable switching languages there are good reasons to use both. It's also important for me because I work with different teams that have different practices and preferences.

    • arendtio 1835 days ago
      I don't know if it is still a thing, but if you are working with SAP HANA (in-memory database) there is a good chance you would like to learn R as they integrated it into their database.
      • hdkrgr 1834 days ago
        Related: tidyverse's dbplyr let's you write tidyverse code querying almost any remote database - leveraging the DBs computation while writing code (almost) as you would for a local data frame. In my old job I got to a point where I would barely ever need to write sql anymore because of this. https://cran.r-project.org/web/packages/dbplyr/vignettes/dbp...
      • Annatar 1835 days ago
        Vertica did as well.
    • Bootvis 1835 days ago
      All those general things you can do in R as well. Maybe not as well developed and widely used as the Python counterparts but definitily there.
  • samt430 1835 days ago
    Apart from the odd library I have rarely found much benefit to using both languages for DS as you end up expressing the same paradigms just in different syntax. And I think for good reason too - the basis of the tools used to do data science arent in the languages themselves but the packages built for the task which is why there's often an R equivalent of a Python package and vice versa. So in effect almost no one 'uses' R/Python for DS as much as Rube-Goldberg highly-optimised compiled libraries together using different syntax.ie dplyr/pandas/scipy/ggplot etc are the real stars of the show.

    Rather than R vs Python I hope one of two things happen. Either both languages get replaced by a 'better' ML language eg Swift / Julia giving us users a 'turtles all the way down' experience and removing the reliance on complied packages. Or, second option, they get relegated even further into being nothing but glue between some common data formats specific to the type of work found in DS allowing you the user basically a choice between syntactic-sugar of one glue-language versus the other. Something like Apache Arrow springs to mind but I'm not sure where they are at the moment

  • cwyers 1835 days ago
    Nobody would write R code the way this book is teaching. For that matter, nobody looking to do linear regression for data science in Python is doing their own matrix math, either.
    • conjectures 1835 days ago
      Linear regression should be regarded as the statistical equivalent of stripping down a rifle, reassembling it and checking its function. If you develop any statistical software, you're going to end up doing it at some point.
    • minimaxir 1835 days ago
      A lot of MOOCs teach matrix algebra that way, which I admit I'm not fond of.
  • dajohnson89 1835 days ago
    This is an interesting concept. It makes perfect sense to learn both simultaneously. On the other hand, it must be confusing at times. Imagine learning two languages at the same time, from the same book. It's an experiment I haven't tried, but i'm curious about the outcome.
    • tyingq 1835 days ago
      Also interesting to learn two languages, in parallel, where neither is particularly good at parallelism :)
      • rpier001 1835 days ago
        Are you trolling right now? R's idioms for parallelism are pretty darn straightforward and easy to use. Effective too.
      • hjk05 1835 days ago
        What makes you think R or Python are bad at parallelism? My experience is that both are very decent.
        • tyingq 1835 days ago
          Both have packages that can manage subprocesses. Both have inherently single threaded interpreters.
          • rpier001 1832 days ago
            What technology in what language are you comparing them against?
      • nurettin 1835 days ago
        Humans are very good at doing things successively and calling it parallel.
    • h4t 1835 days ago
      I taught myself python by converting ruby programs using ruby syntax, coding styles, libraries etc. I just used the pick-axe book as a reference and converted code examples from any interesting and/or useful/relevant python source I could get my hands on at the time. (circa 2008?) So this actually makes sense to me learning 2 languages at once as long as you understand what you are doing with them to begin with. I would not recommend it to a beginner though. Interesting though. See it through and complete it. I'll check back.
    • wisty 1835 days ago
      I suspect a lot of readers will have some proficiency in either R or Python and want to learn the other.
  • photon_lines 1835 days ago
    Nice work!!!

    If anyone is interested, I also made a 'Learn R by Example' project which attempts to teach R through code comments: https://github.com/photonlines/Learn-R-by-Example

  • Lanrei 1835 days ago
    Shouldn't '<-' be used instead of '=' for variable assignments, as they aren't the same thing in R.
    • _Wintermute 1835 days ago
      It's a large source of bike-shedding in the R community but out of the 5 assignment operators in R, those two are largely the same.

      There's a good explanation here: https://stackoverflow.com/questions/1741820/what-are-the-dif...

    • tylermw 1835 days ago
      They are the same thing, minus the corner case of assignment within a function call:

      e.g.

        divide = function(x, y) {
          return(x/y)
        }
      
        divide(y = 2, x = 1)
        divide(y <- 1, x <- 2)
      
      
      These two calls give the same result, as the second results in assignment and then passing the argument by position. Other than this case, they are exactly interchangeable.
  • purple-again 1835 days ago
    I read a good portion of the first chapter and skimmed the rest. I am very much enjoying this book and hope that you continue to write more chapters.
    • zelda_1 1835 days ago
      thanks! I'm planning to add a few more chapters.
  • cttet 1835 days ago
    I learnt Matlab/R/Python/Javascript altogether. It was a mess for me to grok all the similar-but-different syntax, but it make me realized more about the real essence of what is really important for the domain rather than language details.
  • starpilot 1835 days ago
    Python and Julia might make more sense today.
    • demirev 1835 days ago
      Is Julia actually used that much? I've been hearing people herald it as the next big thing for the last five years or so, but it doesn't seem like it has taken off. I personally don't know anybody who uses it professionally (I know plenty of people who use R professionally). The most recent SO survey also indicates that it is rather unpopular.
      • ddragon 1835 days ago
        Julia just got to 1.0 last year, and it does have areas where it's already between the best options in scientific computing such as differential equations solving and mathematical optimization. Regardless of not being the most popular (against the behemoths that have many times it's age and support), you shouldn't have trouble doing most stuff with it from machine learning to statistics. And it's a pretty fun and fairly unique language to learn and use.
  • jamisteven 1835 days ago
    I feel like ever book ive ever read, on any programming language, makes me immediately want to pound my head into my desk. Nothing against the author, its just so clear that as it pertains to programming, being good at programming, and the teaching of it never come hand in hand. Same goes for real life, some of the best data scientists I work with, cant for the life of them explain concepts, and then the ones who are great at explaining it, can rarely execute with the same eloquence.
    • stevewodil 1835 days ago
      This is true for a lot of things! For example, the famous artists that perform the music on stage (nowadays, at least) likely didn't write the song they are singing.

      Teaching and executing are two separate skills. Fun little anecdote, in high school I had this AWFUL science teacher. He would literally just have us watch Crash Course videos to get the concepts. Turns out he was a relatively distinguished scientist himself..

  • Y_Y 1835 days ago
    How come there's no source in the git repo? You shouldn't just throw up the PDF and call it a day, github isn't just a trendy file host.
    • zelda_1 1835 days ago
      Good point. Just added the code. Thanks!
      • Y_Y 1835 days ago
        Thanks for adding the source for all the code snippets. I'd also be interested in the LaTeX (if that's what you used) source for the book itself if you feel like adding that.
  • master_yoda_1 1835 days ago
    But why? If you would never join NASA then why train to be an astronaut. Do something useful in life.
  • awestley 1835 days ago
    I picture the author as an early-to-mid 20-year old that is great on the data sci side but weak as a developer. Considers "having to learn" python or R as a hurdle to their accessing their innate mathematical prowess.
    • purple-again 1835 days ago
      Literally the second sentence in the book he states he was an undergrad in 2006.