emarket_data_explorer.shopee_async_crawler

This module provides the Shopee Async Crawler functionality.

Todo:

class emarket_data_explorer.shopee_async_crawler.ShopeeAsyncCrawlerHandler(ip_addresses: List[str], proxy_auth: str, header: Dict[str, Any], data_handler: ShopeeAsyncCrawlerDataProcesser)

this provides crawler capabilities to read data from shopee

get_page(num_of_product: int, page_length: int) int

get the page of the amount of items to be scrapped

async process_all(kwargs: Dict[str, Any]) AsyncCrawlerResponse

the entry function to be called by a workflow class for ALL mode

async process_comment(kwargs: Dict[str, Any]) AsyncCrawlerResponse

the entry function to be called by a workflow class for comment mode

async process_index(kwargs: Dict[str, Any]) AsyncCrawlerResponse

the entry function to be called by a workflow class for index mode

async process_product(kwargs: Dict[str, Any]) AsyncCrawlerResponse

the entry function to be called by a workflow class for product mode

read_good_comments_url(shop_id: int, item_id: int) str

create the comment url based on shopee api

read_good_info_url(shop_id: int, item_id: int) str

create the product url based on shopee api

read_search_indexs_url(keyword: str, page: int, page_length: int) str

create the index url based on shopee api

async scrap_comment(ids_pool: List[tuple], kwargs: Dict[str, Any]) Tuple[DataFrame, List[int]]

get the buyer’s comment by using item_id and shop_id by shopee api

async scrap_index(kwargs: Dict[str, Any]) Tuple[DataFrame, List[tuple], List[int]]

read the search result by shopee api

async scrap_product_info(ids_pool: List[tuple], merged_search_index_df: DataFrame, kwargs: Dict[str, Any]) Tuple[DataFrame, List[int]]

get the good details like articles and SKU by shopee api

split_list(alist: List[str], wanted_parts: int = 1) List[List[str]]

divide the number of task into the multiplier of wanted_parts, like f([100]) becomes [[50],[50]]