Skip to Main Content
 

Blog title

Custom Datasets - Creating your own data

INFO_LIST

  • Author Head Binubuo
  • Category Binubuo API
  • Created Mon, Jun 27 2022
 

blog sections

Your first custom dataset

Once you have tried the individual generators and quickfetch you probably want to start creating synthetic data that is much more like your data, and also some of the more advanced inheritance and sorting functionality is only supported in custom datasets as well.

Custom datasets are defined using a JSON document describing the different columns you want in the dataset. For a full reference of the dataset definitions you can read the Custom Dataset documentation.

Defining your own datasets enables you to fully utlise the advanced features of Binubuo. Columns automatically inherit traits and values from other columns if they are related. Imagine that you have a column with a first name, and a column with an email. Then the email will automatically use that name as part of the email address. Another example could be that you have a column with a country and a column with an address or location. In that case, the address or the location will automatically adjust to the country that has been generated. There are even more advanced derived values, such as job title to salary, text category to text content, food category to indivudal types of food and additives or time related columns.

Creating a customer dataset

To demonstrate the different capabilities of the custom dataset definition we will create a customer dataset. We will have different types of data, that can show either core functionality or the different inheritance related features. We define our customer dataset as a set with the following columns of information:

  • Customer ID: A unique id for our customer
  • Country ID: A choice of 3 different countries that our customer can come from. Either The States, China or Denmark
  • First Name: The first name
  • Last Name: The last name
  • Birthdate: When they were born
  • City: City they live in
  • Address: Address in city
  • Location: Coordinates of customer
  • Email Address: Email address of customer
  • Credit Card: Type of credit card
  • Credit Card Number: credit card number
  • Phone Number: Phone number to customer

So if we follow the guide from the JSON schema documentation the schema of our dataset would look like this:

{
    "columns": [
        {
            "column_name": "customer_id"
            , "column_datatype": "number"
            , "column_type": "builtin"
            , "builtin_type": "numiterate"
            , "builtin_startfrom": "1"
            , "builtin_increment_min": "1"
            , "builtin_increment_max": "1"
        }, {
            "column_name": "country_id"
            , "column_datatype": "string"
            , "column_type": "referencelist"
            , "reference_static_list": "US,CN,DK"
        }, {
            "column_name": "first_name"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "first_name"
        }, {
            "column_name": "last_name"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "last_name"
        }, {
            "column_name": "birthdate"
            , "column_datatype": "date"
            , "column_type": "generated"
            , "generator": "birthday"
            , "arguments": "'adult'"
        }, {
            "column_name": "city"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "city"
        }, {
            "column_name": "address"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "address"
        }, {
            "column_name": "location"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "country_location"
        }, {
            "column_name": "email"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "email"
        }, {
            "column_name": "credit_card_name"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "credit_card_name"
        }, {
            "column_name": "credit_card_number"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "credit_card_number"
        }, {
            "column_name": "phone_number"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "phone_number"
        }
    ]
}

Now that we have created the JSON document describing our dataset, we can go ahead and POST the dataset to create an endpoint to get the data. We will call it customer_data and the create request looks like this:

POST URL

https://binubuo.com/api/data/?schemaname=customer_data

The body of the request is the above JSON document. Once we have created it, we can call the dataset to get our data.

https://binubuo.com/api/data/custom/[unique_hash]/customer_data

Details about how to get your unique_hash you can find here: API Reference

and below you can see an example of the data:

[
    2, 'US', 'Zoe', 'Anderson', '1970-02-06T01: 38: 33Z', 'Kodiak'
    , '331 Bewog River', '47.6614473824101, 13.5798014432815'
    , 'Zoe.Anderson@hottol.mil', 'maestro', '5018208080767569', '552 264 3846'
],
[
    3, 'US', 'William', 'Hughes', '1993-06-29T16: 29: 43Z', 'Cheyenne'
    , '459 Wule Ridge', '47.80284617756, 10.2317300408737'
    , 'William.Hughes@tegu.net', '4026532264145500', '816 373 2934'
],
[
    4, 'US', 'Kayla', 'Parker', '1975-06-10T07: 15: 11Z', 'San Diego'
    , '253 Menuj Parkway', '46.6017906518347, 12.6511300142198'
    , 'Kayla.Parker@vi.int', 'laser', '6304157181568723', '532 227 2563'
]

As you can see everything is lining up with inherited data and we have create data that is ready to use in our database or wherever we need to use the data. In the next article I will show the client, which will really open up the capabilities of Binubuo.

Don't have an account yet?

If you don't have an account on Binubuo yet, you can create one real quick. Just click "Sign up" in the top right corner, and you are on your way to create all the synthetic data you could dream about.

Want to see how to get started: Get Started Guide

Get more guides and help from the blog

Follow Binubuo on Twitter:


If you already have an account on RapidAPI, you can use your account to access Binubuo

Connect on RapidAPI