Show HN: Bamboolib – A GUI for Pandas (Python Data Science)

(bamboolib.com)

119 points | by __tobals__ 1623 days ago

15 comments

westurner 1623 days ago
This looks excellent. The ability to generate the Python code for the pandas dataframe transformations looks to be more useful than OpenRefine, TBH.
How much work would it be to use Dask (and Dask-ML) as a backend?
I see the OneHotEncoder button. Have you considered integration with Yellowbrick? They've probably already implemented a few of your near-future and someday roadmap items involving hyperparameter selection and model selection and visualization? https://www.scikit-yb.org/en/latest/
This video shows more of the advanced bamboolib features: https://youtu.be/I0a58h1OCcg
The live histogram rebinning looks useful. Recently I read about a 'shadowgram' / ~KDE approach with very many possible bin widths translucently overlaid in one chart. https://stats.stackexchange.com/questions/68999/how-to-smear...
Yellowbrick also has a bin width optimization visualization in yellowbrick.target.binning.BalancedBinningReference: https://www.scikit-yb.org/en/latest/api/target/binning.html
Great work.
[-]
- kite_and_code 1622 days ago
  Thank you for your feedback and support :) Are you currently using OpenRefine?
  We are currently thinking about providing other dataframe libraries like dask or pyspark and similar. However, we are a little bit unsure on how to make sure that there is user demand before we implement it. It is not a complete rewrite but it would require some additional abstractions at some points in the library. And we need to check if some features might not be available any more. Would dask support be a reason to buy for you?
  Great hint with yellowbrick and yes, we are considering some of those features as well if there is a useful place in the library.
  In general, we are also thinking about ways how you can extend the library for yourself so that you can add your own analyses/charts of choice and then they will come up again the right point in time. In case that this is useful.
  [-]
  - westurner 1612 days ago
    In the past, I've looked at OpenRefine and Jupyter integration. Once I've learned to do data transformation with pandas and sklearn with code, I'll report back to you.
    Pandas-profiling has a number of cool descriptive statistics features as well. https://github.com/pandas-profiling/pandas-profiling
    There's a new IterativeImputer in Scikit-learn 0.22 that it'd be cool to see visualizations of. https://twitter.com/TedPetrou/status/1197150813707108352 https://scikit-learn.org/stable/modules/impute.html
    A plugin model would be cool; though configuring the container every time wouldn't be fun. Some ideas about how we could create a desktop version of binderhub in order to launch REES-compatible environments on our own resources: https://github.com/westurner/nbhandler/issues/1
- hv42 1622 days ago
  The UI is heavily inspired by the one from Trifacta/Cloud Dataprep. (i.e. histograms when selecting columns, brushing to start a transformation...)
  I guess that makes it easy to get started with pandas (and learn about the pandas api). I wonder how some advanced transforms such as join/union/pivot will look like?
  [-]
  - __tobals__ 1622 days ago
    Yes, we looked at many different tools for inspiration and Trifacta was among them.
    For join in action, you can watch this video: https://www.youtube.com/watch?v=r59Q19oCMr8&t=3s
    We also support pivot and melt. About union: what do you have in mind here?
- bayesian_horse 1623 days ago
  Dask has only a subset of Pandas available.
  [-]
  - __tobals__ 1622 days ago
    Could you send me a link to the docs where they say which ones are not included in Pandas? Would love to take a closer look at his.
    [-]
    - westurner 1612 days ago
      Set difference and/or intersection of dir(pd.DataFrame) and dir(dask.DataFrame) with inspect.getargspec and inspect.doc would be a useful document for either or both projects.
      pyfilemods generates a ReStructuredText document with introspected API comparisons. "Identify and compare Python file functions/methods and attributes from os, os.path, shutil, pathlib, and path.py" https://github.com/westurner/pyfilemods
sauwan 1623 days ago
As a non-data-scientist who does some infrequent data analysis in python, this looks amazing. But not something I think we can justify paying for with the amount of analysis I do.
If this doesn't work out commercially, would you consider open-sourcing it?
I am curious how many data scientists don't already have pipelines that do similar functions?
[-]
- kite_and_code 1622 days ago
  Thank you for your feedback! How often do you perform analyses? And what do you think is something that you can justify paying? We would like to find a suitable pricing schema for all use cases.
  About open-sourcing: I cannot tell right now what the situation will be in the future. But I can tell you that we believe in Open-Source and technologies which dont provide a vendor lockin. This is also why we export the pandas code. So, you are always flexible with your code and you own the result of your work.
  Basically, we want to strengthen people to use Open-Source software at the core but we also want to make it as user-friendly as fully proprietary solutions like Trifacta. So, you will have the best of both worlds without the vendor lockin.
  We talked to many Data Scientists and some already started creating similar packages etc but they never got far because it takes a long and consistent effort to catch most of the cases. Also, it quickly becomes a software engineering challenge.
- yuvapavan 1622 days ago
  Hi Sauwan,
  You can sign up for https://cloud.trifacta.com/ our SaaS offering for your data analysis.
madmaze 1623 days ago
Looks great, but not a fan of the licensing model! $600/year for something thats 99% open source?
https://bamboolib.8080labs.com/pricing/
[-]
- __tobals__ 1623 days ago
  So if I understand you correctly, you deem the price for the annual license too high? Could you elaborate on what you mean by 99% open source and how that relates to your perception of the price?
  [-]
  - set92 1623 days ago
    Because is mainly using Jupyter Notebook, Python and Pandas.
    In this times is normal for companies to create their own products using open source products, but to some people is not very good seen.
    In my case I think is not worth it a tool that only works in some specific environment, that doesn't have many functionalities, and it costs more than all the products of Jetbrains. I don't like either that is a tool built on top of open sources projects trying to charge a big amount while it does not have almost any functionality.
    [-]
    - IanCal 1623 days ago
      > Because is mainly using Jupyter Notebook, Python and Pandas.
      I really don't think that is correct. It integrates into those / builds on them, but those projects absolutely do not have the features that I can see playing around with this product.
simlan 1623 days ago
Looks really interesting. The GUI is purely built on ipywidgets correct ?
[-]
- kite_and_code 1623 days ago
  This is correct! ipywidgets is awesome because it merges the powerful Python ecosystem with all the capabilities of the web including HTLM, CSS and Javascript. We are super excited what we might build merging those two worlds
amrrs 1623 days ago
Previous - https://news.ycombinator.com/item?id=20614896
[-]
- kite_and_code 1623 days ago
  Thank you for linking the old post because the last time some people doubted that we would actually create this :)
  [-]
  - mellosouls 1623 days ago
    To be fair, last time you posted a Show HN with nothing to show.
    [-]
    - kite_and_code 1622 days ago
      Just for future reference: we did not submit the first post. It was submitted by someone who found our old demo on reddit or linkedin. However, it sparked interest and then we tried to answer some of the questions but we were limited by our karma etc Nevertheless, we learned some lessons in that first post, so happy that it happened :)
    - __tobals__ 1623 days ago
      Fair point indeed, but the good news is: we improved on that :) And it won't happen again.
      [-]
      - mellosouls 1623 days ago
        Fair enough, good luck!
        [-]
        kite_and_code 1622 days ago
        Thank you :) Support is always super important in such an early stage!
amrrs 1623 days ago
If this is supposed to replace Microsoft Excel - Would an enterprise be willing to pay $500 / month for this thing instead of getting an entire Office Suite?
I don't want to demean this amazing work but I think the user persona and the price for the tier seeming to be a mismatch.
[-]
- kite_and_code 1623 days ago
  Interesting to see that you already think about replacing Excel. We won't go that far. Currently, bamboolib is intended to save time for Python Data Scientists and therefore it integrates perfectly into their working environment. Python Data Scientists cost their companies between 2k to 10k USD per month. And with bamboolib they should easily save 10h per month. Especially if they need to explore new data sets or don't know the full pandas API by heart. Thus, the price of 49$ per month should be a great deal because we want to provide 10x value per cost.
  On top, bamboolib aims to reduce the training time for new Data Scientists.
  In addition, bamboolib makes pandas available to people who are proficient with working on data but not specifically Python or coding. Thus, companies can let people with business knowledge work on the data transformations who then hand over the code to Data Engineers who deploy the code, or similar
  What do you think about this?
  [-]
  - argument_clinic 1623 days ago
    Awesome demo, so you deserve some honest feedback.
    After the demo I looked at the pricing and immediately decided it's not worth it by far.
    From the viewpoint of a freelance software dev that does quite a lot if data cleaning lately, the price is so high that I wouldn't even bother trying it on binder.
    As a comparison, I pay €53/year for PyCharm professional that I can install on as many machines as I like and pay for my Excel/Office a similar yearly amount. I switch between 3 computers, so having a license nailed to one of them is a dealbreaker.
    Also, $49 + taxes roughly translates to 1 hour of income per month - every month if I use it or not. Plus I'd have to factor in the time it takes to setup and deal with license problems & bugs. Setting up licenses behind a company firewall is quite a challenge - unless you use a simple txt.file license option like jetbrains. BTW, Jetbrains also has a very cool feature in the license model: If you pay for at least a year, you get to keep the last version that's at least one year old for free. From my usage, I estimate that bamboolib could save me 1 hour per month max - currently I just paste to excel if I need to scroll in a larger data set or use the .sample() function to look at some examples.
    So to tempt me there should be a freelancer license at a maximum of $49/year that covers at least 3 machines (only use one at a time) and should work offline.
    BTW, the companies I work for all have not made the jump to Jupyter labs, yet. They are firmly Excel based and I'm constantly trying to drum up interest for Jupyter. I also do regular meetup talks on Jupyter (where normal business people show up) and many of them don't know that it exists, yet.
    So having a very cheap or even free personal license would showcase your program to companies... and you could write the license in a way that companies need to buy a full price version.
    [-]
    - kite_and_code 1622 days ago
      Thank you so much for your honest feedback! That is the enabler so that we can serve you better in the future
      Also, thank you for the licensing input - so that we can consider and support other options in the future.
      Why do you think that you will only get 1h per month out of this? How many hours per month do you spend with pandas? Given this estimate, I can totally understand your price proposition of 5$/month because we also aim to provide at least 10x value. However, we assume 10h savings per month. Did you already see the data visualization features? https://www.youtube.com/watch?v=I0a58h1OCcg
      Did I understand you correctly, that you propose offering a free version (because it might not make sense to charge less than 5$ per month anyway?) for business and another one for 49$/year for freelancers/businesses? Or do you also propose adding another company license?
    - __tobals__ 1623 days ago
      Thank you very much for that honest feedback. It means a lot to me. Honestly, I will have to reflect on what you said, but I would like to get back to you on this later.
- acomjean 1623 days ago
  Excel or perhaps r-studio.
Pinegulf 1623 days ago
This looks user friendly and will certainly have impact on ppl learning python/pandas.
Yet not for me as I do not like the 'click to create macro code'. They never have all the things I want and like to have my code in my syntax. But that's me.
[-]
- kite_and_code 1623 days ago
  Thank you for your feedback! :) We also dont like software where you are restricted to whats available. Therefore, we integrated so tightly with pandas, so whenever something is not available you can add the code in the user interface or just in another cell. What do you think about this hybrid approach?
  [-]
  - ebg13 1623 days ago
    > What do you think about this hybrid approach?
    What I want is to be able to add new functionality to the bamboo UI via new buttons for specific functionality that you don't have. Maybe if you had a plugin architecture.
    [-]
    - kite_and_code 1622 days ago
      That sounds great and is something that is already considered to some extent in the current software architecture. Can you name one exact feature that you would like to add? Also, feel free to reach out to us via email. We would be happy to help you write the first plugin/extension :)
      [-]
      - ebg13 1621 days ago
        The basic gist of one feature I want to add is a cell delimiter split that turns one row into more rows (repeating the nonsplit cells) with grouping to determine whether the splits happen concurrently or sequentially if multiple of such splits happen in the same row (resulting in a Cartesian product-like result or not). Right now you only have a split that turns one column into more columns.
        I have code for this already and I'd just want to add it to the UI.
        [-]
        kite_and_code 1620 days ago
        Sounds interesting and very similar to a combination of a string split and then unpivot/melt if I understood you correctly. Please feel free to reach out to us via email: info AT 8080labs.com and then we can discuss how you can create an extension if you like :)
- brodoll 1623 days ago
  Exactly what I thought for my first use case as a person who is not proeficient with pandas.
  [-]
  - kite_and_code 1623 days ago
    What exactly did you think? That bamboolib is helpful or that you might be locked in by the options that are available so far?
__tobals__ 1623 days ago
bamboolib is a GUI for transforming and visualising pandas DataFrame objects with no to little code. Feedback is appreciated!
[-]
- mkl 1623 days ago
  Your hex plots use squares, not hexagons, so they aren't actually hex plots.
  I think you should be much more up front that this is a commercial product, because very few things based on Jupyter and Pandas are.
  [-]
  - __tobals__ 1622 days ago
    I think we have already fixed the plot naming. Did you see that in the video?
    About the communication: I can understand that. We try to communicate clearly that we both are commercial and support Open Data. It's not easy, however. Especially when people see you for the first time.
    According to your perception, where could we be more clear on the communication?
    [-]
    - mkl 1622 days ago
      Yes, hex plots in the video.
      I think you should mention it near the top of the demo notebook linked here, if it's intended to be a main entry point. I looked at the notebook and videos before heading to bamboolib.com, which has the first mention of pricing, at which point I felt like I'd wasted my time (because I'd gotten the wrong idea). I think most people are used to anything demoed in a Jupyter notebook being open source.
      [-]
      - kite_and_code 1620 days ago
        Thank you for your input and the suggestion where you would have expected this info the first time
bayesian_horse 1623 days ago
In terms of resource utilization I recommend to link to the youtube videos first and not to a "binder" url that starts up a jupyter container...
[-]
- kite_and_code 1622 days ago
  Totally true, we also thought about this. However, as we understood the terms of Show HN, the link is supposed to link directly to a live demo? Maybe we understood this wrongly? Any advice on this would be appreciated..
pplonski86 1623 days ago
Who is your target user? I think that if user is able to install jupyter notebook, pandas and load data with python, there is a high chance that user can also search pandas documentation and write few lines of code.
Anyway, I like the idea of making UI for Pandas, but I think that there should be more comprehensive software to make data science easier for non-coders.
[-]
- __tobals__ 1623 days ago
  Our main target user is a professional python data scientist that wants to be faster at data wrangling and visualization so that they can focus on understanding the data instead of coding the same pandas commands over and over again. That’s why bamboolib has both data transformation and exploration features included. In the future, we will provide more sophisticated features from which also more experienced data scientists can profit.
  Yet, I definitely agree that bamboolib especially nicely suits pandas learners and non-coders. Would be happy to have a direct exchange on your ideas on this topic (feel free to pm me at tobiaskrabel at gmail dot com).
  [-]
  - pplonski86 1623 days ago
    I have a similar problem with my product which offers machine learning as a service. It offers complex data science features (building ML models) for non-coders. From my experience, I can tell that it is hard to find users that don't have enough technical background to train the model by themself and enough background to understand what to do with ML model.
    [-]
    - __tobals__ 1623 days ago
      Interesting. So on which platform are you providing that service? What does your product look like (maybe share a link?)? And did you find a "solution" to your problem of finding a large enough audience?
  - missosoup 1623 days ago
    This doesn't accelerate professional data scientists.
    To me the only use of this tool is to make available the more complex uses of pandas to individuals without the background/understanding of how to wield those.
    But without that understanding, giving those people a UI of functions they have no understanding of is just a recipe for disaster.
    All these tools that aim to lower the barrier to entry for data science without fully automating it are doomed to fail because they have no audience. The market for analysts who aren't also software engineers is shrinking to 0.
    [-]
    - tastroder 1623 days ago
      > But without that understanding, giving those people a UI of functions they have no understanding of is just a recipe for disaster.
      I can see use of the UI in a classroom setting to bridge the learning gap for people that are pretty proficient with the utility libraries like pandas offer, but lack the experience with Python and reading documentation at this point in time. I honestly fail to understand this critique, that sounds like saying we should ban Excel because people could use it to calculate something that doesn't make sense. It's not like pandas does something magical and every half decent Excel user understands the functionality behind the buttons I see in the bamboo demo linked here.
      > The market for analysts who aren't also software engineers is shrinking to 0.
      While I certainly get where this perspective might be coming from, I find it unlikely to be true. The recent acquisition of Tableau and growth of similar no-code tools shows that it's untrue from a business perspective (just read one of the HN threads on these topics, plenty of non-SE people making good use of them). Even from the code perspective, outside of production most of the data analysis code I see hardly shows any signs of good software engineering practice and yet fulfills the task it is written for.
    - kite_and_code 1622 days ago
      Actually, all our initial users are professional data scientists who work at least 10-20h per week with pandas. This is also reflected in the pricing.
      They like the opportunity to not always have to write the code and just reach it via the UI. So they have a smooth workflow without inspecting the data in Libre Office/Excel or having to google so often.
      However, what they especially like are the data exploration and visualization features because they save them a lot of time. You can see a video of those here: https://www.youtube.com/watch?v=I0a58h1OCcg
ebg13 1623 days ago
I'm sad that it's not open source, because I want to add functionality and now I can't.
[-]
- kite_and_code 1622 days ago
  What kind of functionality do you want to add? The software is written in a modular way and it is not too hard to extent it. Please reach out to us via email and then we will help you to extent the library.
- __tobals__ 1623 days ago
  I'm sorry to hear that you are sad because you wanted to contribute but can't. Maybe, we can find a solution to that. If you want to contribute (e.g. by joining our team), feel free to reach out to us! We are always happy to have capable and driven people on board :)
ddgflorida 1623 days ago
Error loading 8080labs/bamboolib_binder_template/master!
RocketSyntax 1623 days ago
wow. i love it. (1) it would be nice if the ui panes did not overlap the table (2) what do you think about automated chart creation as seen in apache zeppelin and databricks notebooks? (3) shaded cells for 0-100 gradients like beakerx. (4) has this been tested with any large memory/ distributed memory pyspark/ koalas libs?
[-]
- kite_and_code 1622 days ago
  Thank you for your excited reaction and your comments :))
  Why do you not want the UI panes to overlap? And what exactly do you mean? Because the pane does not overlap but you can inspect and reach the full table. Maybe it was confusing on the video?
  About visualization: We also provide quite some automated charts as can be seen here: https://www.youtube.com/watch?v=I0a58h1OCcg
  We are thinking about supporting other dataframe-like libraries and hopefully we can support all of them. However, it is a matter of priority here. The architecture enables this in general but we need to find users who actually want and need this. Any idea in that regard is appreciated and if enough interest builds up, we can definitely support this.
  What do you mean by (3) with the shaded cells? Maybe you can give an example here?
- __tobals__ 1623 days ago
  I am happy you like it :) About (1) - (3): I will have a look at those features, but thanks a lot for the feedback. About (4): we currently support pandas, but be basically want to offer many more backends in the future, therefore also making bamboolib interesting for companies that use clusters a lot.
helloiloveyou 1623 days ago
This is certainly amazing! Thanks!
mnist91 1623 days ago
Looks awesome!