This documentation is brand new, as is the service it describes. If you find any issues, please let us know: agnitio-support@economicmodeling.com

Agnitio API Documentation

Transitioning from Episteme? View the guide.

Interested in requesting a demo account or learning more? Email us

Contents

Overview

Agnitio is Emsi's primary data slicing tool for exposing its comprehensive labor market data. Agnitio is a power tool that can perform large and complex queries but might be overkill for simple use cases. While Agnitio largely conforms to a read-only REST interface, most endpoints are dynamic and data-driven, making conventional REST documentation systems less useful. This document will cover how the basic concepts needed to understand Agnitio's query system, how to determine what data you have access to, how to authenticate, and how to perform queries.

Agnitio is useful for answering questions such as the following:

Agnitio's design is intended to facilitate performing these queries over arbitrary groups (aggregations) of areas, industries, occupations, and more. Agnitio is not the only data API that Emsi provides, in particular if you're looking for our Job Posting Analytics data, we have a dedicated API for querying and consuming that data.

Concepts

Agnitio is organized around the concept of a dataset. A dataset encapsulates one or more things that have been measured (e.g. jobs) and how they have been described (e.g. by the occupation of the person performing the job). This first idea (the what) we call metrics. The latter concept we call dimensions. As an example, our basic industry dataset for the United States looks like this:

Dimension

Metrics

This dataset allows us to query the total (sum) number of jobs in a particular area (or group of areas) for all or a group of industries which are of a particular employment class. We treat time as a special dimension that combines with the metrics: given a particular group of areas, industries, and classes, you can request many years of data in a single call; we will cover the specifics later.

Datasets are also versioned, generally by time. For its core datasets, Emsi produces regular releases that include both historic and projected data. The 2017.4 and 2017.3 versions of the US Industry dataset cover the same years of data but 2017.4 includes new or updated sources that have caused us to adjust our estimates. We recommend that API consumers use the latest version of a dataset unless they are working on a long-term project (e.g. writing a grant) that requires the same data be used throughout. The latest version of a dataset can be referenced using the magical version name "latest" in both the meta and data query endpoints.

Dimensions are hierarchical taxonomies. As an example, Emsi's standard U.S. Area dimension description is based on the FIPS system for describing states and counties. The hierarchy has three levels:

That is, a particular county has exactly one "parent" state: Latah County, ID has a code of "16057" and its parent geography is the State of Idaho (code "16"). We use similar taxonomies for industries, occupations, classes of worker, gender, race, ethnicity, and other data facets. The contents of a dimension may vary from dataset version to version; we generally update taxonomies to reflect changes in how key base datasets were gathered.

Endpoints

Agnitio can be reached at /agnitio.emsicloud.com/ There are three major endpoints:

Authentication

Transitioning from Episteme? View special instructions (805) 474-4399.

Authentication is provided by Emsi's token authentication service using OAuth2 client-credentials flow. If you're using a framework/SDK there is likely already middleware for supporting this authentication flow automatically that can be configured using these parameters:

The authentication host is /auth.emsicloud.com/connect/token.

Implementing the authorization flow yourself

If you are using a framework with OAuth middleware, you can simply configure it using the values above. If not, you can implement authentication yourself as follows:

  1. Send a POST request to the authorization host (listed above). The body of the request should be the parameters above encoded using the application/x-www-form-urlencoded scheme (don't forget to set the Content-Type header appropriately).
  2. If the request is successful, the body of the response will be an encoded JSON Web Token (see /jwt.io for reference). Decode the token to extract the access_token and expires_in fields. The latter indicates how long you may use the access_token before it expires and becomes invalid. For best performance, we recommend caching the access_token and reusing it until it expires.
  3. You may now make requests against the main Agnitio endpoints by including an Authorization header with a value of "Bearer access_token", substituting your actual access token for "access_token".

Data Discovery

Your contract with Emsi will determine which datasets you have access to. You can list these datasets and the versions available by querying the /meta endpoint:

curl '/agnitio.emsicloud.com/meta' -H'Authorization: bearer <access_token>'
{
  "datasets": [
    {
      "name": "emsi.us.industry",
      "versions": [
        "2017.4",
        "2017.3",
        "2017.2",
        "2017.1"
      ]
    },
    {
      "name": "emsi.us.grossregionalproduct",
      "versions": [
        "2017.4"
      ]
    },
    ...
  ]
}

Available versions of a specific dataet can be retrieved by adding dataset/<name> to the path:

curl '/agnitio.emsicloud.com/meta/dataset/emsi.us.industry' -H'Authorization: bearer <access_token>'
["2017.4","2017.3","2017.2","2017.1"]

More information about a particular version of a dataset can be requested by adding dataset/<name>/<version> to the path:

curl '/agnitio.emsicloud.com/meta/dataset/emsi.us.industry/2017.4' -H'Authorization: bearer <access_token>'
{
  "datasetName": "emsi.us.industry",
  "versionName": "2017.4",
  "attributes": {
    "yearField": "Year",
    "name": "Industry",
    "path": "Industry",
    "type": "dataset",
    "timesliceType": "year",
    "rollups": "AreaID",
    "maxYearInclusive": "2027",
    "estabStartYear": "2004",
    "currentYear": "2017",
    "minYearInclusive": "2001",
    "earnYear": "2017",
    "redisdb": "51",
    "releaseDate": "2017-11-03 12:00:00Z"
  },
  "dimensions": [
    {
      "name": "ClassOfWorker",
      "levelsStored": [ "1", "2" ]
    },
    {
      "name": "Area",
      "levelsStored": [ "1", "2", "3" ]
    },
    {
      "name": "Industry",
      "levelsStored": [ "1", "2", "3", "4", "5", "6" ]
    }
  ],
  "metrics": [
    { "name": "Jobs.2001" },
    { "name": "Jobs.2002" },
    ...
    { "name": "Jobs.2027" },
    { "name": "Wages.2001" },
    ...
  ]
}

Finally, you can view the hierarchy of a particular dimension of a dataset by adding dataset/<name>/<version>/<dimension> to the path:

curl '/agnitio.emsicloud.com/meta/dataset/emsi.us.industry/2017.4/Area' -H'Authorization: bearer <access_token>'
{
  "name": "Area",
  "hierarchy": [
    {
      "name": "United States",
      "level_name": "1",
      "parent": "",
      "child": "0",
      "abbr": "US"
    },
    {
      "name": "Alabama",
      "level_name": "2",
      "parent": "0",
      "child": "1",
      "abbr": "AL"
    },
    {
      "name": "Autauga",
      "level_name": "3",
      "parent": "1",
      "child": "1001",
      "abbr": "AL"
    },
    {
      "name": "Baldwin",
      "level_name": "3",
      "parent": "1",
      "child": "1003",
      "abbr": "AL"
    }
    ...
  ]
}

Data Queries

Agnitio data queries are performed by assembling a JSON description of the query and POSTing it to the specific dataset you wish to query. We'll begin with an example request that queries the US Industry data to get the number of jobs and establishments in Latah County, ID for the full service restaurant industry:

curl '/agnitio.emsicloud.com/emsi.us.industry/2017.4' -H"Authorization: bearer <access_token>" \
      -H'Content-Type:application/json'
      -d'{
            "metrics": [
                { "name": "Jobs.2017", "as":"2017 Jobs" },
                { "name": "Establishments.2017" }
            ],
            "constraints": [
              { "dimensionName": "Area", "map": { "Latah County, ID": ["16057"] } },
              { "dimensionName": "Industry", "map": { "Full Service Restaurants": ["722511"] } }
           ]
        }'
{
  "data": [
    {
      "name": "Area",
      "type": "String",
      "rows": [ "Latah County, ID" ]
    },
    {
      "name": "Industry",
      "type": "String",
      "rows": [ "Full Service Restaurants" ]
    },
    {
      "name": "2017 Jobs",
      "type": "Double",
      "rows": [ 642.8496643613166 ]
    },
    {
      "name": "Establishments.2017",
      "type": "Double",
      "rows": [ 28.5 ]
    }
  ],
  "errors": [],
  "timings": [
    "Path lookup: 0ms",
    "Parse Query: 1ms",
    "Parallelize Query: 0ms",
    "Waiting on results: 60ms",
    "Build sortable results: 0ms",
    "Sort results: 0ms",
    "Build final response: 0ms",
    "Build query result: 0ms"
  ],
  "totalRows": 1
}

The query structure has two required fields: metrics and constraints. The metrics field is an array of objects which describe which metrics you would like queried and returned. These objects have a required name field which specifies which metric is desired. They also have an optional as field which allows you to determine what the metric is called in the response. The constraints field is an array of objects which describe how the dimensions of the dataset should be limited and aggregated. In this example, we limited the Area dimension to returning a single, arbitrary value ("Latah County, ID") defined as the aggregation of a single FIPS code ("16057"). The valid codes for a dimension can be found via the metadata endpoint in addition to other information, such as a human-readable name.

This example returned only one row but notice that the response format is column-oriented. All rows arrays in a response will have the same number of elements.

The map field in the request constraints deserves a thorough explanation. Fundamentally it maps from the codes defined by the dimension's taxonomy to user defined values that will be returned in the response. In the example above, we used mapped a single user defined name ("Latah County, ID") to a single FIPS code. A more complex mapping might look like this:

{ "dimensionName": "Area", "map": {
     "Seattle Area (10 mile radius)": ["53033", "53035"],
     "Seattle Area (20 mile radius)": ["53033", "53035", "53061", "53053"],
  }
}
This mapping will result in two response values for the Area dimension, each of which is defined by combining the data for various counties.

Sorting, Offsets, and Limits

Agnitio responses can be returned in sorted order by specifying the sortBy field:

{
  "metrics": [
    { "name":"Jobs.2017" }
  ],
  "constraints": [
    { "dimensionName": "Area", "map": { "Seattle Area (10 mile radius)": ["53033", "53035"]} },
    ...
  ],
  "sortBy": [
    { "name":"Area", "direction":"ascending" }
    { "name":"Jobs.2017", "direction":"descending" }
  ]
}
By default Agnitio returns all data requested, but pagination is supported via the offset and limit fields:
{
  "metrics": [ ... ],
  "constraints": [ ... ],
  "offset": 100,
  "limit": 50
}

Location Quotient

Location Quotient is a measure of geographical concentration applied to summable measures such as jobs or establishments. The definition of Location Quotient (LQ) can be found 207-839-1580. Agnitio can calculate LQ for you on datasets with an Area dimension:

{
  "metrics": [
    { "name":"Jobs.2017", "as": "Jobs 2017 LQ"
      "operation":{
        "name":"LocationQuotient"
        "geoparent":"0"
        "along":"Industry"
      }
    }
  ],
  "constraints": [ ... ]
}

Shift Share

Shift Share attempts to explain what factors cause a change in a measure over time. The definition of Shift Share can be found here. Agnitio can calculate Shift Share for you on datasets with an Area dimension and metrics with a time component:

{
  "metrics": [
    { "name":"Jobs.2017",
      "operation":{
        "name":"ShiftShare"
        "geoparent":"0"
        "along":"Industry",
        "base": "Jobs.2007"
      }
    }
  ],
  "constraints": [ ... ]
}
Unlike all other metrics, Shift Share metrics return four response columns. These columns are named prefixed with the value of the metric's as field if supplied and these values: "Job Change", "Parent Growth Effect", "Mix Effect", and "Competitive Effect".