emarket_data_explorer.data_process

This module provides the Shopee Data process functionality.

Todo:

1. when do parallel in v1.4, this aggregation shouldn’t work so need a new way to aggregate later

class emarket_data_explorer.data_process.ShopeeAsyncCrawlerDataProcesser(data_source: str)

This class provides the data process functionality for async version

aggregate_product_data(product_items_container: DataFrame, product_items: DataFrame) DataFrame

it accumulates three items for a page and then aggregate into a df

clean_product_data() None

clean the three list of collecting product detail

extract_product_data(product: Dict[str, Any]) None

extract ‘description’, ‘models’ and ‘hashtag_list’ and append them into lists for later pd concatenation

parse_good_comments(text: str) List[Dict[str, Any]]

parse the scarped comment data by transferring to json

parse_good_info(text: str) List[Dict[str, Any]]

parse the scarped index data by transferring to json

parse_search_indexs(text: str) List[Dict[str, Any]]

parse the scarped index data by transferring to json

process_raw_search_index(result: List[Dict[str, Any]]) DataFrame

process the raw shopee search data from its api

Args:

result:List[[Dict[str, Any]]]: raw data scrapped from shopee

Returns:

product_items (pd.DataFrame): search index data

update_comment_data(comment: List[Dict[str, Any]]) DataFrame

extract ‘model_name’ from the column of ‘product_items’ and then just replace ‘product_items’ with the extraction and then concat with the comment container