ourtils.wrangling

ourtils.wrangling#

Data wrangling functions, typically with pandas

Functions

collapse_multiindex(df[, sep])

Collapses a multi-index, this usually happens after some sort of aggregation.

cols_with_n_distinct_values(df, n_unique[, ...])

Shows columns with a certain number of unique values

compute_distinct_values(dat)

Gets value counts of all non-numeric columns in a dataframe.

compute_pct_unique(series)

Returns the % of unique values of a series.

create_column(df, colname, func, *args, **kwargs)

Creates a new column using a function that takes column names as strings.

crosstab(dat, group_by_vars, count_var)

Computes pct / counts of count_var by group_by_vars

filter_cols(data[, n_distinct_thresh, ...])

Filters a dataframe based on distinct counts, OR using select_include

filter_random(df, col)

Returns the dataframe filtered to a random value of col.

send_column_to(dat, move_cols[, send_to])

Sends a column / columns to the front / back of a dataframe

shout(df[, msg])

A simple function to be used with pd.pipe to print out the size of a dataframe and an optional message.

sort_col_manually(input_df, col_name, ...)

Sorts a column in a dataframe in a manual order

squish(df, index_var[, col_sep, agg_func])

Reshapes wide data into long format and adds a "group" column.

Classes

ColumnSpec(col_names, mapping[, ...])

Specification for an individual column.

SpecCollection(specs)

A collection of columnspecs.