How to use the openml.tasks.list_tasks function in openml

To help you get started, we’ve selected a few openml examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github openml / openml-python / tests / test_tasks / test_task_functions.py View on Github external
def test_list_tasks_paginate(self):
        size = 10
        max = 100
        for i in range(0, max, size):
            tasks = openml.tasks.list_tasks(offset=i, size=size)
            self.assertGreaterEqual(size, len(tasks))
            for tid in tasks:
                self._check_task(tasks[tid])
github openml / openml-python / tests / test_tasks / test_task_methods.py View on Github external
def test_tagging(self):
        task = openml.tasks.get_task(1)
        tag = "testing_tag_{}_{}".format(self.id(), time())
        task_list = openml.tasks.list_tasks(tag=tag)
        self.assertEqual(len(task_list), 0)
        task.push_tag(tag)
        task_list = openml.tasks.list_tasks(tag=tag)
        self.assertEqual(len(task_list), 1)
        self.assertIn(1, task_list)
        task.remove_tag(tag)
        task_list = openml.tasks.list_tasks(tag=tag)
        self.assertEqual(len(task_list), 0)
github openml / openml-python / examples / tasks_tutorial.py View on Github external
filtered_tasks = filtered_tasks.query('estimation_procedure == "10-fold Crossvalidation"')
print(list(filtered_tasks.index))

############################################################################

# Number of tasks
print(len(filtered_tasks))

############################################################################
# Resampling strategies can be found on the
# `OpenML Website `_.
#
# Similar to listing tasks by task type, we can list tasks by tags:

tasks = openml.tasks.list_tasks(tag='OpenML100')
tasks = pd.DataFrame.from_dict(tasks, orient='index')
print("First 5 of %s tasks:" % len(tasks))
pprint(tasks.head())

############################################################################
# Furthermore, we can list tasks based on the dataset id:

tasks = openml.tasks.list_tasks(data_id=61)
tasks = pd.DataFrame.from_dict(tasks, orient='index')
print("First 5 of %s tasks:" % len(tasks))
pprint(tasks.head())

############################################################################
# In addition, a size limit and an offset can be applied both separately and simultaneously:

tasks = openml.tasks.list_tasks(size=10, offset=50)
github openml / openml-python / examples / tasks_tutorial.py View on Github external
# meta data that can be used to filter the tasks and retrieve a set of IDs.
# We can filter this list, for example, we can only list tasks having a
# special tag or only tasks for a specific target such as
# *supervised classification*.
#
# 2. A single task by its ID. It contains all meta information, the target
# metric, the splits and an iterator which can be used to access the
# splits in a useful manner.

############################################################################
# Listing tasks
# ^^^^^^^^^^^^^
#
# We will start by simply listing only *supervised classification* tasks:

tasks = openml.tasks.list_tasks(task_type_id=1)

############################################################################
# **openml.tasks.list_tasks()** returns a dictionary of dictionaries, we convert it into a
# `pandas dataframe `_
# to have better visualization and easier access:

tasks = pd.DataFrame.from_dict(tasks, orient='index')
print(tasks.columns)
print("First 5 of %s tasks:" % len(tasks))
pprint(tasks.head())

############################################################################
# We can filter the list of tasks to only contain datasets with more than
# 500 samples, but less than 1000 samples:

filtered_tasks = tasks.query('NumberOfInstances > 500 and NumberOfInstances < 1000')
github openml / openml-python / circle_drop / _downloads / tasks_tutorial.py View on Github external
tasks = pd.DataFrame.from_dict(tasks, orient='index')

############################################################################
#
# **OpenML 100**
# is a curated list of 100 tasks to start using OpenML. They are all
# supervised classification tasks with more than 500 instances and less than 50000
# instances per task. To make things easier, the tasks do not contain highly
# unbalanced data and sparse data. However, the tasks include missing values and
# categorical features. You can find out more about the *OpenML 100* on
# `the OpenML benchmarking page `_.
#
# Finally, it is also possible to list all tasks on OpenML with:

############################################################################
tasks = openml.tasks.list_tasks()
tasks = pd.DataFrame.from_dict(tasks, orient='index')
print(len(tasks))

############################################################################
# Exercise
# ########
#
# Search for the tasks on the 'eeg-eye-state' dataset.

tasks.query('name=="eeg-eye-state"')

############################################################################
# Downloading tasks
# ^^^^^^^^^^^^^^^^^
#
# We provide two functions to download tasks, one which downloads only a single task by its ID, and one which takes a list of IDs and downloads all of these tasks:
github openml / openml-python / develop / _downloads / 9cf694b89f927974f9d820e08352d33e / 2015_neurips_feurer_example.py View on Github external
# OpenML, which define both the target feature and the train/test split.
#
# .. note::
#    It is discouraged to work directly on datasets and only provide dataset IDs in a paper as
#    this does not allow reproducibility (unclear splitting). Please do not use datasets but the
#    respective tasks as basis for a paper and publish task IDS. This example is only given to
#    showcase the use of OpenML-Python for a published paper and as a warning on how not to do it.
#    Please check the `OpenML documentation of tasks `_ if you
#    want to learn more about them.

####################################################################################################
# This lists both active and inactive tasks (because of ``status='all'``). Unfortunately,
# this is necessary as some of the datasets contain issues found after the publication and became
# deactivated, which also deactivated the tasks on them. More information on active or inactive
# datasets can be found in the `online docs `_.
tasks = openml.tasks.list_tasks(
    task_type_id=openml.tasks.TaskTypeEnum.SUPERVISED_CLASSIFICATION,
    status='all',
    output_format='dataframe',
)

# Query only those with holdout as the resampling startegy.
tasks = tasks.query('estimation_procedure == "33% Holdout set"')

task_ids = []
for did in dataset_ids:
    tasks_ = list(tasks.query("did == {}".format(did)).tid)
    if len(tasks_) >= 1:  # if there are multiple task, take the one with lowest ID (oldest).
        task_id = min(tasks_)
    else:
        raise ValueError(did)
github automl / auto-sklearn / scripts / 2015_nips_paper / setup / get_tasks.py View on Github external
def get_task_ids(dataset_ids):
    # return task ids of corresponding datset ids.

    # active tasks
    tasks_a = openml.tasks.list_tasks(task_type_id=1, status='active')
    tasks_a = pd.DataFrame.from_dict(tasks_a, orient="index")

    # query only those with holdout as the resampling startegy.
    tasks_a = tasks_a[(tasks_a.estimation_procedure == "33% Holdout set")]

    # deactivated tasks
    tasks_d = openml.tasks.list_tasks(task_type_id=1, status='deactivated')
    tasks_d = pd.DataFrame.from_dict(tasks_d, orient="index")

    tasks_d = tasks_d[(tasks_d.estimation_procedure == "33% Holdout set")]

    task_ids = []
    for did in dataset_ids:
        task_a = list(tasks_a.query("did == {}".format(did)).tid)
        if len(task_a) > 1:  # if there are more than one task, take the lowest one.
            task_a = [min(task_a)]
        task_d = list(tasks_d.query("did == {}".format(did)).tid)
        if len(task_d) > 1:
            task_d = [min(task_d)]
        task_ids += list(task_a + task_d)

    return task_ids  # return list of all task ids.
github automl / Auto-PyTorch / run_bench.py View on Github external
parser.add_argument("--architecture", type=str, choices=["shapedresnet", "shapedmlpnet"])
    parser.add_argument("--logging", type=str, choices=["step", "epoch"])
    args = parser.parse_args()

    # Get data
    openml_task_ids = [3, 12, 31, 53, 3917, 7592, 9952, 9977, 9981, 10101, 14965, 146195, 146821, 146822, 
            #146825, # fashion mnist
            167119, 167120]

    # Get IDs with proper splits (ty matze)
    new_task_ids = [] 
    for task_id in openml_task_ids:
        task = openml.tasks.get_task(task_id)
        try:
            time.sleep(args.run_id*0.01)
            tasks_with_same_dataset = openml.tasks.list_tasks( data_id=task.dataset_id, task_type_id=1, output_format='dataframe', ) 
            tasks_with_same_dataset = tasks_with_same_dataset.query("estimation_procedure == '33% Holdout set'" ) 
            if 'evaluation_measures' in tasks_with_same_dataset.columns: 
                tasks_with_same_dataset=tasks_with_same_dataset.query('evaluation_measures != evaluation_measures') 
            else:
                pass 
        except Exception:
            try:
                time.sleep(args.run_id*0.01)
                tasks_with_same_dataset = openml.tasks.list_tasks( data_id=task.dataset_id, task_type_id=1, output_format='dataframe', )
                tasks_with_same_dataset = tasks_with_same_dataset.query("estimation_procedure == '33% Holdout set'" )
                if 'evaluation_measures' in tasks_with_same_dataset.columns:
                    tasks_with_same_dataset=tasks_with_same_dataset.query('evaluation_measures != evaluation_measures')
                else:
                    pass
            except Exception:
                print(task)
github openml / openml-python / develop / _downloads / e6648ddbb54e75b09dd01d69be33f340 / tasks_tutorial.py View on Github external
tasks = openml.tasks.list_tasks(task_type_id=1)

############################################################################
# **openml.tasks.list_tasks()** returns a dictionary of dictionaries by default, which we convert
# into a
# `pandas dataframe `_
# to have better visualization capabilities and easier access:

tasks = pd.DataFrame.from_dict(tasks, orient='index')
print(tasks.columns)
print(f"First 5 of {len(tasks)} tasks:")
print(tasks.head())

# As conversion to a pandas dataframe is a common task, we have added this functionality to the
# OpenML-Python library which can be used by passing ``output_format='dataframe'``:
tasks_df = openml.tasks.list_tasks(task_type_id=1, output_format='dataframe')
print(tasks_df.head())

############################################################################
# We can filter the list of tasks to only contain datasets with more than
# 500 samples, but less than 1000 samples:

filtered_tasks = tasks.query('NumberOfInstances > 500 and NumberOfInstances < 1000')
print(list(filtered_tasks.index))

############################################################################

# Number of tasks
print(len(filtered_tasks))

############################################################################
# Then, we can further restrict the tasks to all have the same resampling strategy: