Sometimes, it is practically or legally not possible to move corpus data to a local machine. This vignette explains the usage of CWB corpora that are hosted on an OpenCPU server.
## polmineR is throttled to use 2 cores as required by CRAN Repository Policy. To get full performance: ## * Use `n_cores <- parallel::detectCores()` to detect the number of cores available on your machine ## * Set number of cores using `options('polmineR.cores' = n_cores - 1)` and `data.table::setDTthreads(n_cores - 1)`
The GermaParl corpus is hosted on an OpenCPU server with the IP
188.8.131.52 (subject to change). To use the corpus, use the
corpus()-method. The only difference is that you will need
to supply the IP address using the argument
gparl object is an object of class
The polmineR at this stage exposes a limited set of its functionality for remote corpora. Simple investigations in the remote corpus are possible.
The returned object has the class
count()-method works for
remote_subcorpus objects, too.
Create directory for registry file-style files with credentials
Create file with credentials for your corpus in this directory
Note: Filename is corpus id in lowercase
## ## registry entry for corpus GERMAPARLSAMPLE ## # long descriptive name for the corpus NAME "GermaParlSample" # corpus ID (must be lowercase in registry!) ID germaparlsample # path to binary data files HOME http://localhost:8005 # optional info file (displayed by ",info;" command in CQP) INFO https://zenodo.org/record/3823245#.XsrU-8ZCT_Q # corpus properties provide additional information about the corpus: ##:: user = "YOUR_USER_NAME" ##:: password = "YOUR_PASSWORD"
Set environment variable “OPENCPU_REGISTRY” in .Renviron to dir just mentioned
Get server whereabouts
Upcoming versions of polmineR will expose further functionality. This is a simple proof-of-concept!