Python data pipelines¶

Features¶

This package implements the basics for building pipelines similar to magrittr in R. Pipelines are created using >>. Internally it uses singledispatch to provide a way for a unified API for different kinds of inputs (SQL databases, HDF, simple dicts, ...).

Basic example what can be build with this package:

>>> from my_library import append_col
>>> import pandas as pd

>>> pd.DataFrame({"a" : [1,2,3]}) >> append_col(x=3)
   a  X
0  1  3
1  2  3
2  3  3

In the future, this package might also implement the verbs from the R packages dplyr and tidyr for pandas.DataFrame and or I will fold this into one of the other available implementation of dplyr style pipelines/verbs for pandas.

Documentation¶

The documentaiton can be found on ReadTheDocs: https://pydatapipes.readthedocs.io

License¶

Free software: MIT license

Credits¶

magrittr and it’s usage in dplyr / tidyr for the idea of using pipelines in that ways
lots of python implementations of dplyr style pipelines: dplython, pandas_ply, dfply

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.