Module markov.dispatchers.data_dispatch

Functions

def add_ds_paths(ds_id: str, ds_paths: List)
def analyze_dataset(ds_id: str) ‑> Any

Analyze a dataset again.

Args

ds_id
id of dataset to be reanalyzed.

Returns

URL of the RUN executing this column_stats.

def compare_datasets(compare_ds_request) ‑> Any

Dispatch request to backend

Args

compare_ds_request (): Object representing the compare dataset request

Returns

Response of this request to markov backend

def create_search_pod_for_clustering_similarity(create_search_pod_req: CreateSearchPodRequest) ‑> CreateSearchPodResponse

Create a search pod for clustering similarity

Args

create_search_pod_req : CreateSearchPodRequest
Object representing the request to create search pod for clustering similarity

Returns:

def download_dataset(ds_id: str, segment_type: str) ‑> Dict

Download dataset segment from s3 bucket

def finish_manual_dataset_upload(resource_identifier: str, upload_id: str, etags: List[str]) ‑> FinishSDKUploadResponse
def get_all_data_families() ‑> List[Dict]

Get all the data_set families

Returns

List of data_set families that this user has access to

def get_clustering_raw_data(get_clustering_raw_data_req: ClusteringRawDataRequest) ‑> ClusterRawDataResponse

Get the raw data for the clustering record_id <-> text mapping

Args

get_clustering_raw_data_req : ClusteringRawDataRequest
contains the request to get the clustering raw data mapping

Returns:

def get_da_run_status(run_id: str) ‑> Dict

Get the data analysis run status

Args

run_id : str
run_id that you want to track

Returns:

def get_data_analysis_status(run_id: str) ‑> Dict

Given the run_id return the analysis of the run

Args

run_id : str
run_id returned by MarkovML backend to track this run

Returns

List

def get_data_family(df_id: str = '', name: str = '') ‑> Any

Return the DataFamily matching id df_id.

Args

name (str):
df_id
Datafamily id

Returns

DataFamily

def get_datafamily_by(name: str = '', df_id='')

Get the data_family by 'name' or 'id' one of these should be present

Args

name : str
name of the datafamily to find with
df_id : str
if of the datafamily to find with

Returns:

def get_dataset_info(ds_id: str = '', ds_name: str = '') ‑> Any

Return a dataset with data_set_id given by ds_id or by ds_name.

Args

ds_id : str
dataset id of the dataset to be fetched
ds_name : str
name of the dataset to be fetched

Returns

Command Response for fetching Dataset Info

def get_dataset_preview(ds_id: str) ‑> Any
def get_dataset_quality(ds_id: str = '', ds_name: str = '') ‑> Any

Return the dataset quality object for a dataset by ds_id or by ds_name.

Args

ds_id : str
dataset id of the dataset to be fetched
ds_name : str
name of the dataset to be fetched

Returns

Command Response for fetching Dataset Quality Results

def get_datasets() ‑> Dict
def get_datasets_of_datafamily(df_id: str) ‑> List

Get all the data sets of the datafamily that match the specified pattern. If Pattern is empty, all datasets are returned.

Args

df_id
Datafamily id for which we want to get all the datasets.

Returns

List of DataSet.

def get_df_matching_pattern(pattern: str) ‑> List[Dict]

Get the datafamily whose name matches the pattern pattern.

Args

pattern
pattern to be matched against the name of the dataFamily

Returns

list of data_families that match the given pattern

def get_ds_matching_pattern(pattern: str, df_id: str = None) ‑> List[Dict]
def get_mkv_ds_info(ds_id: str) ‑> Any

Given a dataset id return the dataset info (metadata about the dataset) from handlers backend. This information is in the hub.

Args

ds_id
Dataset id

Returns

Dataset info.

def get_search_pod_status(search_pod_status_req: SearchPodStatusRequest) ‑> GetSearchPodStatusResponse

Get the status of the search pod

Args

search_pod_status_req : dict
Object representing the request to get the status of the search pod

Returns:

def get_similar_points_in_cluster(similar_points_in_cluster_req: SimilarPointsRequest) ‑> SimilarPointsResponse

Get the similar points in the cluster for the given dataset

Args

similar_points_in_cluster_req (): Object representing the request to get similar points in the cluster Returns:

def get_user_permissions()

Return permissions for the current user.

Returns

MKVUserPermission
User permission details.
def register_data_family(data_family) ‑> Any

Register the given DataFamily with MarkovML. DataFamilies should be unique and should contain one or more datasets gathered to capture information for a specific scenario and adhering to similar schema. Data Family is a collection of datasets that are generated by a similar process adhering to similar schema and addressing similar data_set/ML problems.

Args

data_family
DataFamily for specific problem/space.

Returns

URL to the home page for this DataFamily.

def register_data_set(ds_register_req, analyze: bool = True) ‑> Any

Register the dataset with MarkovML

Args

ds_register_req : DataSetRegistrationRequest
contains all the metadata to register the request with MarkovML
analyze : bool
set to true if you want to analyze (i.e get auto analysis of the data)

Returns:

def register_embedding(embedding_register_req, analyze: bool = True) ‑> Any

Register the embedding with MarkovML

Args

embedding_register_req : EmbeddingRegistrationRequest
contains all the metadata to register the
request with MarkovML
analyze : bool
set to true if you want to analyze (i.e get auto analysis of the data)

Returns:

def start_manual_upload_with_presigned_urls(model_filepath: str, part_count: int, content_type: str) ‑> StartModelUploadWithPresignedUrlsResponse
def update_datafamily(update_req) ‑> Any

update the datafamily for the given dataset

Args

update_req (DatasetDataFamilyUpdate): Returns:

def validate_data_set(ds_reg_req) ‑> Any

Validate the dataset before registering with MarkovML

Args

ds_reg_req(DataSetRegistrationRequest): contains all the metadata to register the request with MarkovML Returns: