analyse package
classification_values module
- app_streamlit.analyse.classification_values.main_values(col_name, col_para_name, parameters, treshold, dataframe)
this function returns the most important values (for example the top 10 if treshold=10) depending on the value of another column.
- Args:
col_name (string): name of the column with the values we want col_para_name (string): column on which we depend parameters (list, string or int): name of the parameters (if multiple : must be for the same col_para_name) treshold (int): max number of values to return dataframe : dataframe that is used
example : main_values(‘n_steps’,’minutes’,[55,35],10,raw_recipes) returns the top 10 n_steps of recipes that take 55 or 35 minutes to do.
- Returns:
top_values : list of dictionnary with values as keys and their count as values
utils module
functions used for statistics
- app_streamlit.analyse.utils.average_and_total_comments_per_contributor(df)
Calculates the average number of comments per recipe and the total number of comments for each contributor.
- Arg:
df (pd.DataFrame): A DataFrame containing recipe data, including ‘contributor_id’ and ‘num_comments’.
- Returns:
df (pd.DataFrame): A DataFrame with ‘contributor_id’, ‘avg_comments_per_recipe’, and ‘total_comments’, where contributor_id is treated as a category.
- app_streamlit.analyse.utils.best_recipe_filter_time(df, time_r, nb_show)
Get information about the best recipes (ranking-higher comments) filtered on time of preparation
- args:
df : pd.DataFrame : dataframe containing columns ‘minutes’,’name’, ‘n_steps’, ‘num_comments’, ‘ingredients’,’avg_reviews’ time_r : str : time of preparation (categorie) we want to filter results on nb_show : int : number of recipes to show
- Returns:
result : pd.DataFrame : recipes info that have the best ranking + higher comment filtered on time_r
- app_streamlit.analyse.utils.calculate_negative_points_nutri_score(row)
Calculate the negative points that will lower the Nutri-Score based on the levels of certain nutrients: calories, sugar, saturated fat, and sodium.
- Args:
row (dict): row from a DataFrame representing nutrition of the recipe.
- Returns:
int: The total negative points calculated based on the thresholds for the given nutrients.
- app_streamlit.analyse.utils.calculate_positive_points_nutri_score(row)
Calculate the positive points that will improve the Nutri-Score based on the level of protein in the food.
- Args:
row (dict): row from a DataFrame representing a food item.
- Returns:
int: The total positive points calculated based on the protein thresholds.
- app_streamlit.analyse.utils.cat_minutes(df)
Transform columns minutes in categorical values
- Args:
df : (pd.DataFrame) : DataFrame containnning ‘minutes’ column
- Returns:
cat_minutes : pd.Series : ‘minutes’ column transformed in categorical values
- app_streamlit.analyse.utils.count_contributors_by_recipe_range_with_bins(df)
Categorize contributors based on the number of unique recipes they contributed.
- Args:
df (pd.DataFrame): DataFrame containing ‘recipe_id’ and ‘contributor_id’.
- Returns:
pd.Series: Series with recipe range categories as index and contributor counts as values.
- app_streamlit.analyse.utils.count_recipes_season(df)
Count recipes per season
- app_streamlit.analyse.utils.get_insight_low_ranking(df)
get insight of number of recipes per time of preparation for all the recipes and for low ranking recpies
- args :
df : (pd.DataFrame) : DataFrame
- Returns
df_low_count : (pd.DataFrame) filter on low ranking df_high_count : (pd.DataFrame) filter on high ranking
- app_streamlit.analyse.utils.get_top_ingredients2(df, df_ingr_map, excluded_ingredients=None, top_n=10)
- app_streamlit.analyse.utils.get_top_tags(df, most_commented=False, top_recipes=20, top_n=10)
Retrieve the most frequently used tags.
- Args:
df (pd.DataFrame): DataFrame containing recipe data. most_commented (bool): Whether to filter by the most commented recipes. top_recipes (int): Number of top recipes to consider if most_commented is True. top_n (int): Number of tags to return.
- Returns:
pd.Series: Top N most frequently used tags.
- app_streamlit.analyse.utils.metrics_main_contributor(df)
Calculate the total number of unique contributors and recipes in the dataset.
- Args:
df (pd.DataFrame): DataFrame containing recipe data.
- Returns:
tuple: Number of unique contributors and recipes.
- app_streamlit.analyse.utils.nutri_score(df)
Calculate the overall Nutri-Score (A to E) for a food item by considering both negative and positive points.
- Args:
- dfDataFrame representing a food item.
It should include keys required for both negative and positive point calculations: “Calories”, “Sugar”, “Saturated Fat”, “Sodium”, and “Protein”.
- Returns:
str: The Nutri-Score grade (A, B, C, D, or E) based on the calculated score.
- app_streamlit.analyse.utils.top_commented_recipes(df, top_n=10)
Extract the top N recipes with the highest number of comments.
- Args:
df (pd.DataFrame): DataFrame containing recipe data. top_n (int): Number of top recipes to return.
- Returns:
pd.DataFrame: DataFrame containing top N recipes.
- app_streamlit.analyse.utils.top_commented_recipes_by_contributors(df, top_contributors, max_recipes_per_contributor=5)
Extracts the top commented recipes for each contributor in the top contributors list.
Parameters: - df (pd.DataFrame): The DataFrame containing recipe data. - top_contributors (pd.DataFrame): A DataFrame containing the IDs of the top contributors. - max_recipes_per_contributor (int): Maximum number of recipes to return per contributor.
Returns: - pd.DataFrame: A DataFrame containing contributor IDs, recipe IDs, names, and number of comments.
- app_streamlit.analyse.utils.top_recipes(df)
Returns the top 5 recipes with the most comments.
- Args:
df pd.DataFrame : DataFrame with all the preprocessed data
- Returns:
pd.DataFrame
- app_streamlit.analyse.utils.top_recipes_user(df)
Returns the top 5 recipes with the most comments from a specific user.
- Args:
- dfpandas.DataFrame
DataFrame with these columns: - ‘name’: recipe names. - ‘avg_reviews’: average rating. - ‘num_comments’: number of comments.
- Returns:
- pandas.DataFrame
A DataFrame with: - ‘Recipe’: recipe names. - ‘Number of comments’: count of comments per recipe. - ‘Average Rating’: average rating for each recipe.
- app_streamlit.analyse.utils.trendy_ingredients_by_seasons(df, ingr_map, top_n)
This function create a dataframe for each seasons and returns the top 200 ingredients used
- Args:
df (dataframe): dataframe cleaned ingr_map (dataFrame): dataFrame mapping ingredient IDs (‘id’) to their names (‘replaced’) top_n (int, optional): number of top ingredients to return. Defaults to 200.
- Returns:
winter_ingr,spring_ingr,summer_ingr,autumn_ingr (pd.series) : four pd.series with the top 200 ingredients used
- app_streamlit.analyse.utils.unique_ingr(df, ingr_map, top_n=200)
This function return the unique ingredients used during each season by comparing all the ingredients used in one season to all the other seasons.
- Args:
df (dataframe): dataframe cleaned ingr_map (dataFrame): dataFrame mapping ingredient IDs (‘id’) to their names (‘replaced’) top_n (int, optional): number of top ingredients to return. Defaults to 200.
- Returns:
winter_unique,spring_unique,summer_unique,autumn_unique (list): return a list for each season of unique ingredients
- app_streamlit.analyse.utils.user_recipes(merged_df, user_id)
Finds the recipes published by the user
- Args:
df (pd.DataFrame): DataFrame with recipe data and a ‘season’ column.
- Returns:
dict: Recipe counts per season.
- app_streamlit.analyse.utils.visualise_low_rank_insight(df_low_count, df_high_count)
Visualise low vs high rank recipes over time of preparation
- app_streamlit.analyse.utils.visualise_recipe_season(df)
Visualise count per season with low and high rankings.