ORKG Contributions

The contribution client is a special client of the ORKG that handles comparing contributions and finding similarity. For that reason it needs to be connected to the simcomp service of the open research knowledge graph.

If you instantiate the ORKG client using a web address or host environment name (Hosts.PRODUCTION, Hosts.SANDBOX, or Hosts.INCUBATING), the corresponding simcomp_host is automatically assigned.

However, if you are running the ORKG locally then you need to specify simcomp_host. See Host for details.

We can access the contributions client directly to do the following:

Get similar contributions

With the following line of code you can easily fetch a set of similar contributions.

Note: the data is cached so you might not see some results immediately after they are added to the ORKG.

### all the parameters are required
orkg.contributions.similar(contribution_id='R8197')
>>> (Success)
{
    "payload": [
        {
            "id": "R8199",
            "label": "ORKG System",
            "paper_id": "R8186",
            "paper_label": "ORKG: A system for representing knowledge graphs for scholarly communication",
            "similarity_percentage": 0.6338162327194794,
        },
        ....
    ]
}

# You can also specify the number of candidates you want to get via
orkg.contributions.similar(contribution_id='R8197', candidate_count=3)
 >>> (Success)
 {
    "payload": [
        {...},
        {...},
        {...}
    ]
 }

Comparing several contributions

You can perform a comparison of several contributions easily as well.

# comparison_type parameter is set by default to 'PATH', but you can set it explicitly to other methods like MERGE
orkg.contributions.compare(contributions=['R6151','R6173','R6162','R6179','R6157','R6146'], comparison_type=ComparisonType.PATH)
>>> (Success)
{
    "timestamp": "2023-11-27T13:45:36.457835",
    "uuid": "8afe9dbd-ada1-4f09-83af-5473940e8de8",
    "payload": {
        "thing": {
            "id": "28650fc0-da87-4c16-84ae-224e1cd2cf12",
            "created_at": "2023-07-04T08:57:18.175301Z",
            "updated_at": "2023-07-04T08:57:18.175312Z",
            "thing_type": "COMPARISON",
            "thing_key": "R307226",
            "config": {
                "predicates": [
                    "location",
                    "basic reproduction number",
                    ...
                ],
                "contributions": [
                    "R288053",
                    "R288056"
                ],
                "transpose": false,
                "type": "PATH"
            },
            "data": {
                "contributions": [
                    {
                        "id": "R288053",
                        "label": "Contribution 1",
                        "paper_id": "R288047",
                        "paper_label": "Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak",
                        "paper_year": "2020"
                    },
                    {
                        "id": "R288056",
                        "label": "Contribution 1",
                        "paper_id": "R288055",
                        "paper_label": "Estimation of the epidemic properties of the 2019 novel coronavirus: A mathematical modeling study",
                        "paper_year": "2020"
                    }
                ],
                "predicates": [
                    {
                        "id": "research problem/determination of the covid-19 basic reproduction number/same as",
                        "label": "research problem/determination of the covid-19 basic reproduction number/same as",
                        "n_contributions": 2,
                        "active": true,
                        "similar_predicates": [
                            "same as"
                        ]
                    },
                    {
                        "id": "research problem/determination of the covid-19 basic reproduction number/description",
                        "label": "research problem/determination of the covid-19 basic reproduction number/description",
                        "n_contributions": 2,
                        "active": true,
                        "similar_predicates": [
                            "description"
                        ]
                    },
                    ....
                ],
                "data": {
                    "basic reproduction number": [
                        [
                            {
                                "id": "L604101",
                                "label": "2.56",
                                "_class": "literal",
                                "classes": [],
                                "path": [
                                    "R288053",
                                    "P23108"
                                ],
                                "path_labels": [
                                    "contribution 1",
                                    "basic reproducion number"
                                ]
                            }
                        ],
                        [
                            {
                                "id": "L604118",
                                "label": "4.48",
                                "_class": "literal",
                                "classes": [],
                                "path": [
                                    "R288056",
                                    "P23108"
                                ],
                                "path_labels": [
                                    "contribution 1",
                                    "basic reproducion number"
                                ]
                            }
                        ]
                    ],
                    "location": [
                        [
                            {
                                "id": "R217514",
                                "label": "China",
                                "_class": "resource",
                                "classes": [],
                                "path": [
                                    "R288053",
                                    "wikidata:P276"
                                ],
                                "path_labels": [
                                    "contribution 1",
                                    "location"
                                ]
                            }
                        ],
                        [
                            {
                                "id": "R217514",
                                "label": "China",
                                "_class": "resource",
                                "classes": [],
                                "path": [
                                    "R288056",
                                    "wikidata:P276"
                                ],
                                "path_labels": [
                                    "contribution 1",
                                    "location"
                                ]
                            }
                        ]
                    ],
                    ....
                }
            }
        }
    }
}

You can export the comparison into various formats, such as DataFrame, CSV, and more. You can do it by specifying the export_format parameter.

orkg.contributions.compare(contributions=['R6151','R6173','R6162','R6179','R6157','R6146'], export_format='CSV')
>>>
Contribution 1,Contribution 2,Contribution 3,Contribution 4,Contribution 5,Contribution 6
value 1,value 2,value 3,value 4,value 5,value 6
value 1,value 2,value 3,value 4,value 5,value 6

Comparisons as pandas.DataFrame

To make it easy to process the content of the comparison and not worry about the JSON format that is returned from the SimComp service.

You can easily deal with the data as pandas.DataFrame object that you can use in down-stream applications.

Warning

When multiple values do belong to one cell, they are represented by a pythonic list.

### One of the parameters is required
# Using the comparison ID
df = orkg.contributions.compare_dataframe(comparison_id='R6751')
type(df)
>>> <pandas.core.frame.DataFrame>
df.head()
>>> ...

# Using the list of contributions
orkg.contributions.compare_dataframe(contributions=['R6151','R6173','R6162','R6179','R6157','R6146'])
type(df)
>>> <pandas.core.frame.DataFrame>
df.head()
>>> ...

The function also offers a parameter to decide whether you would like the dataframe to look exactly like the comparison on the ORKG UI (frontend), or not to get the complete and unfiltered comparison.

# param like_ui is True by default
df = orkg.contributions.compare_dataframe(comparison_id='R6751', like_ui=True)
df.head()
>>> ...
>>> [5 rows x 11 columns]

df = orkg.contributions.compare_dataframe(comparison_id='R6751', like_ui=False)
df.head()
>>> ...
>>> [12 rows x 11 columns]

There is an optional parameter which returns a second dataframe with the following paper metadata (where available): author, doi, publication month, publication year, url, research field, venue, title, paper id, and contribution id.

# param include_meta is False by default
df, df_meta = orkg.contributions.compare_dataframe(contributions=['R34499', 'R34504'], include_meta=True)
df.head()
>>> ...
>>> [5 rows x 2 columns]
df_meta.head()
>>> ...
>>> [8 rows x 2 columns]

Adding contributions to a paper

In development!