March 24, 2014

Datumbox API and rDatumBox package

This post is highly inspired by the Julian Hillebrand's interesting post.

In this post, I try to explain the way of a language detection through Datumbox API.

  1. Register Datumbox Firstly, you need to register Datumbox. After the registration, you will get your API key.

  2. Connect Datumbox API Datumbox API is RESTful with simple authentification, so you can get results easily through the API.

Here's the API document

http://www.datumbox.com/files/API-Documentation-1.0v.pdf

Julian's code is very helpful, so I highly recommend you read the post below.

http://thinktostart.wordpress.com/2013/09/09/sentiment-analysis-on-twitter-with-datumbox-api/

Today, For my understand, I try to write codes without the useful codes.

There are three steps to get data with Datumbox API. 1. Make your request URL 2. Get JSON data through the API 3. Process JSON data

OK, let's follow each steps.

Make your request URL

First is the preparation of the request URL for API. Usually, a request URL of a Web API is composed of two parts, baseURL and query parameters. Now I want to get the result of a language detection. As you can see in the API document, baseURL is “http://api.datumbox.com/1.0/LanguageDetection.json”, and query parameters are text, which you want to detect the language used in, and api_key. “?” joins the baseURL and query parameters. Also, “&” connects query parameters, see below.

apikey <- "YOUR KEY"
txt <- "Hello, this is Daisuke Ichikawa"
baseURL <- "http://api.datumbox.com/1.0/LanguageDetection.json"
url <- URLencode(sprintf("%s?api_key=%s&text=%s", baseURL, apikey, txt))

Get JSON data through the API

Next step is to get data from the API. Load httr package and just type a commend GET. Datumbox API returns the result in the form of JSON. GET stores the result as “Raw”, so you have to convert “text”

To handle web data, I expect most R users work with RCurl package. However I recommend you to use httr package, which is a wrapper package of RCurl. I'm convince that it will make your handling easier. The example below is a simple case, so I'm afraid that you won't find the convenience of the package. But, for a more complex case, such as in the case that you need OAuth2.0 authentification, httr would be a great help for you. In future articles, I will show the more complex cases.

require(httr)
req <- GET(url)
req <- content(req, as="text")

Process JSON data

Datumbox API returns JSON data. To manage JSON data, there are three R packages, rjson,RJSONIO,and jsonlite. In this case, I choose jsonlite, because it returns results as data.frame, which is convenient for my analysis. Other two packages returns list. I recommend you should choose right package for your aim.

If you use RJSONIO, it is useful to switch the as parameter of content function from text to parsed in the example above. This converts the GET results to convert parsed results automatically with RJSONIO, you don't have to use fromJSON again.

require(jsonlite)
res <- fromJSON(req)

Result

OK, here's the result. I get the result of language detection.

res
##   status result
## 1      1     en

For convenience sake, I divide this short analysis into three steps. However, all codes are about 10 lines, very simple.

Codes

apikey <- "YOUR KEY"
txt <- "Hello, this is Daisuke Ichikawa"
baseURL <- "http://api.datumbox.com/1.0/LanguageDetection.json"
url <- URLencode(sprintf("%s?api_key=%s&text=%s", baseURL, apikey, txt))
require(httr)
req <- GET(url)
req <- content(req, as="text")
require(jsonlite)
res <- fromJSON(req)

rDatumBox package

Lastly I introduce my package. I have created the simple package specific for Datumbox API, which names rDatumBox. You can install it from github as follows. If you want to analyze your data without sweat, this package might help you :)

library(devtools)
install_github("dichika/rDatumBox")

No comments:

Post a Comment