Post: Pyspark Training
Endpoint: /pyspark_training/train
Initiates distributed model training using PySpark over Spark-Hadoop clusters. It enables scalable machine learning workflows (Classification, Regression, Clustering) optimized for handling large datasets (up to ~2 GB), leveraging distributed compute.
Input Parameters:
-
config_file_path (string, required): Absolute path to the configuration file containing training parameters, dataset location (e.g., HDFS/ClickHouse), and algorithm specifications.
Output:
Returns a JSON response with the training job status and metadata (e.g., model storage path, training metrics).