Module markov.dispatchers.data_dispatch
Functions
def add_ds_paths(ds_id: str, ds_paths: List)
def analyze_dataset(ds_id: str) ‑> Any
-
Analyze a dataset again.
Args
ds_id
- id of dataset to be reanalyzed.
Returns
URL of the RUN executing this column_stats.
def compare_datasets(compare_ds_request) ‑> Any
-
Dispatch request to backend
Args
compare_ds_request (): Object representing the compare dataset request
Returns
Response of this request to markov backend
def create_search_pod_for_clustering_similarity(create_search_pod_req: CreateSearchPodRequest) ‑> CreateSearchPodResponse
-
Create a search pod for clustering similarity
Args
create_search_pod_req
:CreateSearchPodRequest
- Object representing the request to create search pod for clustering similarity
Returns:
def download_dataset(ds_id: str, segment_type: str) ‑> Dict
-
Download dataset segment from s3 bucket
def finish_manual_dataset_upload(resource_identifier: str, upload_id: str, etags: List[str]) ‑> FinishSDKUploadResponse
def get_all_data_families() ‑> List[Dict]
-
Get all the data_set families
Returns
List of data_set families that this user has access to
def get_clustering_raw_data(get_clustering_raw_data_req: ClusteringRawDataRequest) ‑> ClusterRawDataResponse
-
Get the raw data for the clustering record_id <-> text mapping
Args
get_clustering_raw_data_req
:ClusteringRawDataRequest
- contains the request to get the clustering raw data mapping
Returns:
def get_da_run_status(run_id: str) ‑> Dict
-
Get the data analysis run status
Args
run_id
:str
- run_id that you want to track
Returns:
def get_data_analysis_status(run_id: str) ‑> Dict
-
Given the run_id return the analysis of the run
Args
run_id
:str
- run_id returned by MarkovML backend to track this run
Returns
List
def get_data_family(df_id: str = '', name: str = '') ‑> Any
-
Return the DataFamily matching id df_id.
Args
- name (str):
df_id
- Datafamily id
Returns
DataFamily
def get_datafamily_by(name: str = '', df_id='')
-
Get the data_family by 'name' or 'id' one of these should be present
Args
name
:str
- name of the datafamily to find with
df_id
:str
- if of the datafamily to find with
Returns:
def get_dataset_info(ds_id: str = '', ds_name: str = '') ‑> Any
-
Return a dataset with data_set_id given by ds_id or by ds_name.
Args
ds_id
:str
- dataset id of the dataset to be fetched
ds_name
:str
- name of the dataset to be fetched
Returns
Command Response for fetching Dataset Info
def get_dataset_preview(ds_id: str) ‑> Any
def get_dataset_quality(ds_id: str = '', ds_name: str = '') ‑> Any
-
Return the dataset quality object for a dataset by ds_id or by ds_name.
Args
ds_id
:str
- dataset id of the dataset to be fetched
ds_name
:str
- name of the dataset to be fetched
Returns
Command Response for fetching Dataset Quality Results
def get_datasets() ‑> Dict
def get_datasets_of_datafamily(df_id: str) ‑> List
-
Get all the data sets of the datafamily that match the specified pattern. If Pattern is empty, all datasets are returned.
Args
df_id
- Datafamily id for which we want to get all the datasets.
Returns
List of DataSet.
def get_df_matching_pattern(pattern: str) ‑> List[Dict]
-
Get the datafamily whose name matches the pattern
pattern
.Args
pattern
- pattern to be matched against the name of the dataFamily
Returns
list of data_families that match the given pattern
def get_ds_matching_pattern(pattern: str, df_id: str = None) ‑> List[Dict]
def get_mkv_ds_info(ds_id: str) ‑> Any
-
Given a dataset id return the dataset info (metadata about the dataset) from handlers backend. This information is in the hub.
Args
ds_id
- Dataset id
Returns
Dataset info.
def get_search_pod_status(search_pod_status_req: SearchPodStatusRequest) ‑> GetSearchPodStatusResponse
-
Get the status of the search pod
Args
search_pod_status_req
:dict
- Object representing the request to get the status of the search pod
Returns:
def get_similar_points_in_cluster(similar_points_in_cluster_req: SimilarPointsRequest) ‑> SimilarPointsResponse
-
Get the similar points in the cluster for the given dataset
Args
similar_points_in_cluster_req (): Object representing the request to get similar points in the cluster Returns:
def get_user_permissions()
-
Return permissions for the current user.
Returns
MKVUserPermission
- User permission details.
def register_data_family(data_family) ‑> Any
-
Register the given DataFamily with MarkovML. DataFamilies should be unique and should contain one or more datasets gathered to capture information for a specific scenario and adhering to similar schema. Data Family is a collection of datasets that are generated by a similar process adhering to similar schema and addressing similar data_set/ML problems.
Args
data_family
- DataFamily for specific problem/space.
Returns
URL to the home page for this DataFamily.
def register_data_set(ds_register_req, analyze: bool = True) ‑> Any
-
Register the dataset with MarkovML
Args
ds_register_req
:DataSetRegistrationRequest
- contains all the metadata to register the request with MarkovML
analyze
:bool
- set to true if you want to analyze (i.e get auto analysis of the data)
Returns:
def register_embedding(embedding_register_req, analyze: bool = True) ‑> Any
-
Register the embedding with MarkovML
Args
embedding_register_req
:EmbeddingRegistrationRequest
- contains all the metadata to register the
- request with MarkovML
analyze
:bool
- set to true if you want to analyze (i.e get auto analysis of the data)
Returns:
def start_manual_upload_with_presigned_urls(model_filepath: str, part_count: int, content_type: str) ‑> StartModelUploadWithPresignedUrlsResponse
def update_datafamily(update_req) ‑> Any
-
update the datafamily for the given dataset
Args
update_req (DatasetDataFamilyUpdate): Returns:
def validate_data_set(ds_reg_req) ‑> Any
-
Validate the dataset before registering with MarkovML
Args
ds_reg_req(DataSetRegistrationRequest): contains all the metadata to register the request with MarkovML Returns: