analyse package

classification_values module

app_streamlit.analyse.classification_values.main_values(col_name, col_para_name, parameters, treshold, dataframe)

this function returns the most important values (for example the top 10 if treshold=10) depending on the value of another column.

Args:: col_name (string): name of the column with the values we want col_para_name (string): column on which we depend parameters (list, string or int): name of the parameters (if multiple : must be for the same col_para_name) treshold (int): max number of values to return dataframe : dataframe that is used

example : main_values(‘n_steps’,’minutes’,[55,35],10,raw_recipes) returns the top 10 n_steps of recipes that take 55 or 35 minutes to do.

Returns:: top_values : list of dictionnary with values as keys and their count as values

utils module

functions used for statistics

app_streamlit.analyse.utils.average_and_total_comments_per_contributor(df)

Calculates the average number of comments per recipe and the total number of comments for each contributor.

Arg:: df (pd.DataFrame): A DataFrame containing recipe data, including ‘contributor_id’ and ‘num_comments’.
Returns:: df (pd.DataFrame): A DataFrame with ‘contributor_id’, ‘avg_comments_per_recipe’, and ‘total_comments’, where contributor_id is treated as a category.

app_streamlit.analyse.utils.best_recipe_filter_time(df, time_r, nb_show)

Get information about the best recipes (ranking-higher comments) filtered on time of preparation

args:: df : pd.DataFrame : dataframe containing columns ‘minutes’,’name’, ‘n_steps’, ‘num_comments’, ‘ingredients’,’avg_reviews’ time_r : str : time of preparation (categorie) we want to filter results on nb_show : int : number of recipes to show
Returns:: result : pd.DataFrame : recipes info that have the best ranking + higher comment filtered on time_r

app_streamlit.analyse.utils.calculate_negative_points_nutri_score(row)

Calculate the negative points that will lower the Nutri-Score based on the levels of certain nutrients: calories, sugar, saturated fat, and sodium.

Args:: row (dict): row from a DataFrame representing nutrition of the recipe.
Returns:: int: The total negative points calculated based on the thresholds for the given nutrients.

app_streamlit.analyse.utils.calculate_positive_points_nutri_score(row)

Calculate the positive points that will improve the Nutri-Score based on the level of protein in the food.

Args:: row (dict): row from a DataFrame representing a food item.
Returns:: int: The total positive points calculated based on the protein thresholds.

app_streamlit.analyse.utils.cat_minutes(df)

Transform columns minutes in categorical values

Args:: df : (pd.DataFrame) : DataFrame containnning ‘minutes’ column
Returns:: cat_minutes : pd.Series : ‘minutes’ column transformed in categorical values

app_streamlit.analyse.utils.count_contributors_by_recipe_range_with_bins(df)

Categorize contributors based on the number of unique recipes they contributed.

Args:: df (pd.DataFrame): DataFrame containing ‘recipe_id’ and ‘contributor_id’.
Returns:: pd.Series: Series with recipe range categories as index and contributor counts as values.

app_streamlit.analyse.utils.count_recipes_season(df): Count recipes per season

app_streamlit.analyse.utils.get_insight_low_ranking(df)

get insight of number of recipes per time of preparation for all the recipes and for low ranking recpies

args :: df : (pd.DataFrame) : DataFrame
Returns: df_low_count : (pd.DataFrame) filter on low ranking df_high_count : (pd.DataFrame) filter on high ranking

app_streamlit.analyse.utils.get_top_ingredients2(df, df_ingr_map, excluded_ingredients=None, top_n=10)

app_streamlit.analyse.utils.get_top_tags(df, most_commented=False, top_recipes=20, top_n=10)

Retrieve the most frequently used tags.

Args:: df (pd.DataFrame): DataFrame containing recipe data. most_commented (bool): Whether to filter by the most commented recipes. top_recipes (int): Number of top recipes to consider if most_commented is True. top_n (int): Number of tags to return.
Returns:: pd.Series: Top N most frequently used tags.

app_streamlit.analyse.utils.metrics_main_contributor(df)

Calculate the total number of unique contributors and recipes in the dataset.

Args:: df (pd.DataFrame): DataFrame containing recipe data.
Returns:: tuple: Number of unique contributors and recipes.

app_streamlit.analyse.utils.nutri_score(df)

Calculate the overall Nutri-Score (A to E) for a food item by considering both negative and positive points.

Args:

dfDataFrame representing a food item.: It should include keys required for both negative and positive point calculations: “Calories”, “Sugar”, “Saturated Fat”, “Sodium”, and “Protein”.

Returns:

str: The Nutri-Score grade (A, B, C, D, or E) based on the calculated score.

app_streamlit.analyse.utils.top_commented_recipes(df, top_n=10)

Extract the top N recipes with the highest number of comments.

Args:: df (pd.DataFrame): DataFrame containing recipe data. top_n (int): Number of top recipes to return.
Returns:: pd.DataFrame: DataFrame containing top N recipes.

app_streamlit.analyse.utils.top_commented_recipes_by_contributors(df, top_contributors, max_recipes_per_contributor=5)

Extracts the top commented recipes for each contributor in the top contributors list.

Parameters: - df (pd.DataFrame): The DataFrame containing recipe data. - top_contributors (pd.DataFrame): A DataFrame containing the IDs of the top contributors. - max_recipes_per_contributor (int): Maximum number of recipes to return per contributor.

Returns: - pd.DataFrame: A DataFrame containing contributor IDs, recipe IDs, names, and number of comments.

app_streamlit.analyse.utils.top_recipes(df)

Returns the top 5 recipes with the most comments.

Args:: df pd.DataFrame : DataFrame with all the preprocessed data
Returns:: pd.DataFrame

app_streamlit.analyse.utils.top_recipes_user(df)

Returns the top 5 recipes with the most comments from a specific user.

Args:

dfpandas.DataFrame: DataFrame with these columns: - ‘name’: recipe names. - ‘avg_reviews’: average rating. - ‘num_comments’: number of comments.

Returns:

pandas.DataFrame: A DataFrame with: - ‘Recipe’: recipe names. - ‘Number of comments’: count of comments per recipe. - ‘Average Rating’: average rating for each recipe.

app_streamlit.analyse.utils.trendy_ingredients_by_seasons(df, ingr_map, top_n)

This function create a dataframe for each seasons and returns the top 200 ingredients used

Args:: df (dataframe): dataframe cleaned ingr_map (dataFrame): dataFrame mapping ingredient IDs (‘id’) to their names (‘replaced’) top_n (int, optional): number of top ingredients to return. Defaults to 200.
Returns:: winter_ingr,spring_ingr,summer_ingr,autumn_ingr (pd.series) : four pd.series with the top 200 ingredients used

app_streamlit.analyse.utils.unique_ingr(df, ingr_map, top_n=200)

This function return the unique ingredients used during each season by comparing all the ingredients used in one season to all the other seasons.

Args:: df (dataframe): dataframe cleaned ingr_map (dataFrame): dataFrame mapping ingredient IDs (‘id’) to their names (‘replaced’) top_n (int, optional): number of top ingredients to return. Defaults to 200.
Returns:: winter_unique,spring_unique,summer_unique,autumn_unique (list): return a list for each season of unique ingredients

app_streamlit.analyse.utils.user_recipes(merged_df, user_id)

Finds the recipes published by the user

Args:: df (pd.DataFrame): DataFrame with recipe data and a ‘season’ column.
Returns:: dict: Recipe counts per season.

app_streamlit.analyse.utils.visualise_low_rank_insight(df_low_count, df_high_count): Visualise low vs high rank recipes over time of preparation

app_streamlit.analyse.utils.visualise_recipe_season(df): Visualise count per season with low and high rankings.