This R Markdown Notebook demonstrates how to use the paperfetcher Python package to perform forward and backward citation searching in R through the reticulate interface to Python.
To execute a code chunk in RStudio, click the Run button within the chunk or place your cursor inside it and press Cmd+Shift+Enter.
We first need to run a couple of lines of code to install reticulate, Python, and paperfetcher.
# Install reticulate if not already installed
if(!require("reticulate")){
install.packages("reticulate")
library(reticulate)
}
Loading required package: reticulate
# Install Python if not already installed, and create a new virtualenv for paperfetcher
# (Uncomment the lines below to run code)
#install_python("3.7:latest")
#virtualenv_create("paperfetcher", version="3.7:latest")
#use_virtualenv("paperfetcher")
# Install paperfetcher
# (Uncomment the lines below to run code)
#py_install("paperfetcher", envname="paperfetcher")
# Import the paperfetcher package
paperfetcher <- import("paperfetcher")
Backward reference chasing involves retrieving all articles which are referenced (cited) by a set of starting articles.
Let’s fetch all the references from two papers with DOIs:
using the Crossref service.
First, we create a search object, and initialize it with a list of strings, each string being a DOI:
search <- paperfetcher$snowballsearch$CrossrefBackwardReferenceSearch(list("10.1021/acs.jpcb.1c02191", "10.1080/07448481.2022.2059376"))
search()
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:00<00:00, 1.43it/s]
100%|██████████| 2/2 [00:01<00:00, 1.40it/s]
100%|██████████| 2/2 [00:01<00:00, 1.40it/s]
How many articles did our search return?
py_len(search)
[1] 150
Just as we did for handsearching, we can get a Dataset of DOIs from the search results:
doi_ds <- search$get_DOIDataset()
We can display this as a DataFrame:
Or save it to a text file:
doi_ds$save_txt("out/snowball_back.txt")
We can also convert it to RIS format:
ris_ds <- search$get_RISDataset()
Converting results to RIS format.: 0%| | 0/150 [00:00<?, ?it/s]
Converting results to RIS format.: 1%| | 1/150 [00:00<00:32, 4.55it/s]
Converting results to RIS format.: 1%|▏ | 2/150 [00:00<00:32, 4.49it/s]
Converting results to RIS format.: 2%|▏ | 3/150 [00:00<00:33, 4.35it/s]
Converting results to RIS format.: 3%|▎ | 4/150 [00:00<00:34, 4.28it/s]
Converting results to RIS format.: 3%|▎ | 5/150 [00:01<00:32, 4.51it/s]
Converting results to RIS format.: 4%|▍ | 6/150 [00:01<00:29, 4.82it/s]
Converting results to RIS format.: 5%|▍ | 7/150 [00:01<00:29, 4.90it/s]
Converting results to RIS format.: 5%|▌ | 8/150 [00:01<00:29, 4.90it/s]
Converting results to RIS format.: 6%|▌ | 9/150 [00:01<00:28, 5.00it/s]
Converting results to RIS format.: 7%|▋ | 10/150 [00:02<00:28, 4.90it/s]
Converting results to RIS format.: 7%|▋ | 11/150 [00:02<00:28, 4.80it/s]
Converting results to RIS format.: 8%|▊ | 12/150 [00:02<00:27, 4.94it/s]
Converting results to RIS format.: 9%|▊ | 13/150 [00:02<00:28, 4.79it/s]
Converting results to RIS format.: 9%|▉ | 14/150 [00:02<00:29, 4.67it/s]
Converting results to RIS format.: 10%|█ | 15/150 [00:03<00:28, 4.67it/s]
Converting results to RIS format.: 11%|█ | 16/150 [00:03<00:27, 4.85it/s]
Converting results to RIS format.: 11%|█▏ | 17/150 [00:03<00:26, 5.02it/s]
Converting results to RIS format.: 12%|█▏ | 18/150 [00:03<00:26, 4.95it/s]
Converting results to RIS format.: 13%|█▎ | 19/150 [00:03<00:26, 4.93it/s]
Converting results to RIS format.: 13%|█▎ | 20/150 [00:04<00:26, 4.97it/s]
Converting results to RIS format.: 14%|█▍ | 21/150 [00:04<00:25, 5.11it/s]
Converting results to RIS format.: 15%|█▍ | 22/150 [00:04<00:24, 5.27it/s]
Converting results to RIS format.: 15%|█▌ | 23/150 [00:04<00:29, 4.30it/s]
Converting results to RIS format.: 16%|█▌ | 24/150 [00:05<00:29, 4.24it/s]
Converting results to RIS format.: 17%|█▋ | 25/150 [00:05<00:27, 4.55it/s]
Converting results to RIS format.: 17%|█▋ | 26/150 [00:05<00:26, 4.73it/s]
Converting results to RIS format.: 18%|█▊ | 27/150 [00:05<00:26, 4.58it/s]
Converting results to RIS format.: 19%|█▊ | 28/150 [00:05<00:25, 4.77it/s]
Converting results to RIS format.: 19%|█▉ | 29/150 [00:06<00:27, 4.41it/s]
Converting results to RIS format.: 20%|██ | 30/150 [00:06<00:26, 4.48it/s]
Converting results to RIS format.: 21%|██ | 31/150 [00:06<00:25, 4.64it/s]
Converting results to RIS format.: 21%|██▏ | 32/150 [00:06<00:24, 4.74it/s]
Converting results to RIS format.: 22%|██▏ | 33/150 [00:06<00:24, 4.81it/s]
Converting results to RIS format.: 23%|██▎ | 34/150 [00:07<00:24, 4.76it/s]
Converting results to RIS format.: 23%|██▎ | 35/150 [00:07<00:23, 4.89it/s]
Converting results to RIS format.: 24%|██▍ | 36/150 [00:07<00:24, 4.59it/s]
Converting results to RIS format.: 25%|██▍ | 37/150 [00:07<00:24, 4.63it/s]
Converting results to RIS format.: 25%|██▌ | 38/150 [00:08<00:25, 4.46it/s]
Converting results to RIS format.: 26%|██▌ | 39/150 [00:08<00:24, 4.58it/s]
Converting results to RIS format.: 27%|██▋ | 40/150 [00:08<00:23, 4.71it/s]
Converting results to RIS format.: 27%|██▋ | 41/150 [00:08<00:22, 4.75it/s]
Converting results to RIS format.: 28%|██▊ | 42/150 [00:08<00:22, 4.91it/s]
Converting results to RIS format.: 29%|██▊ | 43/150 [00:09<00:21, 4.94it/s]
Converting results to RIS format.: 29%|██▉ | 44/150 [00:09<00:21, 4.99it/s]
Converting results to RIS format.: 30%|███ | 45/150 [00:09<00:20, 5.08it/s]
Converting results to RIS format.: 31%|███ | 46/150 [00:09<00:20, 5.16it/s]
Converting results to RIS format.: 31%|███▏ | 47/150 [00:09<00:20, 4.98it/s]
Converting results to RIS format.: 32%|███▏ | 48/150 [00:10<00:20, 4.96it/s]
Converting results to RIS format.: 33%|███▎ | 49/150 [00:10<00:19, 5.10it/s]
Converting results to RIS format.: 33%|███▎ | 50/150 [00:10<00:20, 4.78it/s]
Converting results to RIS format.: 34%|███▍ | 51/150 [00:10<00:20, 4.73it/s]
Converting results to RIS format.: 35%|███▍ | 52/150 [00:10<00:20, 4.74it/s]
Converting results to RIS format.: 35%|███▌ | 53/150 [00:11<00:19, 4.91it/s]
Converting results to RIS format.: 36%|███▌ | 54/150 [00:11<00:19, 4.94it/s]
Converting results to RIS format.: 37%|███▋ | 55/150 [00:11<00:19, 4.90it/s]
Converting results to RIS format.: 37%|███▋ | 56/150 [00:11<00:18, 5.13it/s]
Converting results to RIS format.: 38%|███▊ | 57/150 [00:11<00:17, 5.26it/s]
Converting results to RIS format.: 39%|███▊ | 58/150 [00:12<00:17, 5.17it/s]
Converting results to RIS format.: 39%|███▉ | 59/150 [00:12<00:17, 5.29it/s]
Converting results to RIS format.: 40%|████ | 60/150 [00:12<00:18, 4.96it/s]
Converting results to RIS format.: 41%|████ | 61/150 [00:12<00:17, 5.01it/s]
Converting results to RIS format.: 41%|████▏ | 62/150 [00:12<00:17, 5.03it/s]
Converting results to RIS format.: 42%|████▏ | 63/150 [00:13<00:18, 4.78it/s]
Converting results to RIS format.: 43%|████▎ | 64/150 [00:13<00:17, 4.86it/s]
Converting results to RIS format.: 43%|████▎ | 65/150 [00:13<00:17, 4.84it/s]
Converting results to RIS format.: 44%|████▍ | 66/150 [00:13<00:16, 5.05it/s]
Converting results to RIS format.: 45%|████▍ | 67/150 [00:13<00:16, 5.18it/s]
Converting results to RIS format.: 45%|████▌ | 68/150 [00:14<00:16, 4.90it/s]
Converting results to RIS format.: 46%|████▌ | 69/150 [00:14<00:16, 5.02it/s]
Converting results to RIS format.: 47%|████▋ | 70/150 [00:14<00:17, 4.59it/s]
Converting results to RIS format.: 47%|████▋ | 71/150 [00:14<00:18, 4.38it/s]
Converting results to RIS format.: 48%|████▊ | 72/150 [00:14<00:16, 4.60it/s]
Converting results to RIS format.: 49%|████▊ | 73/150 [00:15<00:16, 4.74it/s]
Converting results to RIS format.: 49%|████▉ | 74/150 [00:15<00:19, 3.89it/s]
Converting results to RIS format.: 50%|█████ | 75/150 [00:15<00:18, 4.15it/s]
Converting results to RIS format.: 51%|█████ | 76/150 [00:15<00:17, 4.29it/s]
Converting results to RIS format.: 51%|█████▏ | 77/150 [00:16<00:16, 4.56it/s]
Converting results to RIS format.: 52%|█████▏ | 78/150 [00:16<00:15, 4.73it/s]
Converting results to RIS format.: 53%|█████▎ | 79/150 [00:16<00:14, 4.79it/s]
Converting results to RIS format.: 53%|█████▎ | 80/150 [00:16<00:13, 5.03it/s]
Converting results to RIS format.: 54%|█████▍ | 81/150 [00:16<00:14, 4.92it/s]
Converting results to RIS format.: 55%|█████▍ | 82/150 [00:17<00:15, 4.53it/s]
Converting results to RIS format.: 55%|█████▌ | 83/150 [00:17<00:13, 4.82it/s]
Converting results to RIS format.: 56%|█████▌ | 84/150 [00:17<00:13, 4.71it/s]
Converting results to RIS format.: 57%|█████▋ | 85/150 [00:17<00:13, 4.91it/s]
Converting results to RIS format.: 57%|█████▋ | 86/150 [00:17<00:13, 4.90it/s]
Converting results to RIS format.: 58%|█████▊ | 87/150 [00:18<00:12, 5.00it/s]
Converting results to RIS format.: 59%|█████▊ | 88/150 [00:18<00:12, 4.87it/s]
Converting results to RIS format.: 59%|█████▉ | 89/150 [00:18<00:12, 4.72it/s]
Converting results to RIS format.: 60%|██████ | 90/150 [00:18<00:12, 4.67it/s]
Converting results to RIS format.: 61%|██████ | 91/150 [00:19<00:12, 4.75it/s]
Converting results to RIS format.: 61%|██████▏ | 92/150 [00:19<00:12, 4.63it/s]
Converting results to RIS format.: 62%|██████▏ | 93/150 [00:19<00:11, 4.89it/s]
Converting results to RIS format.: 63%|██████▎ | 94/150 [00:19<00:11, 4.78it/s]
Converting results to RIS format.: 63%|██████▎ | 95/150 [00:19<00:11, 4.78it/s]
Converting results to RIS format.: 64%|██████▍ | 96/150 [00:20<00:10, 5.04it/s]
Converting results to RIS format.: 65%|██████▍ | 97/150 [00:20<00:10, 4.83it/s]
Converting results to RIS format.: 65%|██████▌ | 98/150 [00:20<00:10, 5.03it/s]
Converting results to RIS format.: 66%|██████▌ | 99/150 [00:20<00:10, 4.91it/s]
Converting results to RIS format.: 67%|██████▋ | 100/150 [00:20<00:10, 4.89it/s]
Converting results to RIS format.: 67%|██████▋ | 101/150 [00:21<00:09, 4.94it/s]
Converting results to RIS format.: 68%|██████▊ | 102/150 [00:21<00:09, 4.99it/s]
Converting results to RIS format.: 69%|██████▊ | 103/150 [00:21<00:09, 5.18it/s]
Converting results to RIS format.: 69%|██████▉ | 104/150 [00:21<00:08, 5.25it/s]
Converting results to RIS format.: 70%|███████ | 105/150 [00:21<00:08, 5.21it/s]
Converting results to RIS format.: 71%|███████ | 106/150 [00:22<00:08, 5.36it/s]
Converting results to RIS format.: 71%|███████▏ | 107/150 [00:22<00:08, 5.37it/s]
Converting results to RIS format.: 72%|███████▏ | 108/150 [00:22<00:07, 5.25it/s]
Converting results to RIS format.: 73%|███████▎ | 109/150 [00:22<00:07, 5.24it/s]
Converting results to RIS format.: 73%|███████▎ | 110/150 [00:22<00:07, 5.37it/s]
Converting results to RIS format.: 74%|███████▍ | 111/150 [00:22<00:07, 5.07it/s]
Converting results to RIS format.: 75%|███████▍ | 112/150 [00:23<00:07, 5.16it/s]
Converting results to RIS format.: 75%|███████▌ | 113/150 [00:23<00:07, 5.01it/s]
Converting results to RIS format.: 76%|███████▌ | 114/150 [00:23<00:07, 4.68it/s]
Converting results to RIS format.: 77%|███████▋ | 115/150 [00:23<00:07, 4.70it/s]
Converting results to RIS format.: 77%|███████▋ | 116/150 [00:24<00:07, 4.62it/s]
Converting results to RIS format.: 78%|███████▊ | 117/150 [00:24<00:07, 4.54it/s]
Converting results to RIS format.: 79%|███████▊ | 118/150 [00:24<00:07, 4.50it/s]
Converting results to RIS format.: 79%|███████▉ | 119/150 [00:24<00:06, 4.66it/s]
Converting results to RIS format.: 80%|████████ | 120/150 [00:24<00:06, 4.76it/s]
Converting results to RIS format.: 81%|████████ | 121/150 [00:25<00:06, 4.61it/s]
Converting results to RIS format.: 81%|████████▏ | 122/150 [00:25<00:05, 4.67it/s]
Converting results to RIS format.: 82%|████████▏ | 123/150 [00:25<00:05, 4.64it/s]
Converting results to RIS format.: 83%|████████▎ | 124/150 [00:25<00:05, 4.49it/s]
Converting results to RIS format.: 83%|████████▎ | 125/150 [00:26<00:05, 4.51it/s]
Converting results to RIS format.: 84%|████████▍ | 126/150 [00:26<00:05, 4.69it/s]
Converting results to RIS format.: 85%|████████▍ | 127/150 [00:26<00:04, 4.80it/s]
Converting results to RIS format.: 85%|████████▌ | 128/150 [00:26<00:04, 4.51it/s]
Converting results to RIS format.: 86%|████████▌ | 129/150 [00:26<00:04, 4.63it/s]
Converting results to RIS format.: 87%|████████▋ | 130/150 [00:27<00:04, 4.66it/s]
Converting results to RIS format.: 87%|████████▋ | 131/150 [00:27<00:04, 4.55it/s]
Converting results to RIS format.: 88%|████████▊ | 132/150 [00:27<00:03, 4.78it/s]
Converting results to RIS format.: 89%|████████▊ | 133/150 [00:27<00:03, 4.86it/s]
Converting results to RIS format.: 89%|████████▉ | 134/150 [00:28<00:03, 4.33it/s]
Converting results to RIS format.: 90%|█████████ | 135/150 [00:29<00:10, 1.39it/s]
Converting results to RIS format.: 91%|█████████ | 136/150 [00:31<00:14, 1.06s/it]
Converting results to RIS format.: 91%|█████████▏| 137/150 [00:32<00:10, 1.21it/s]
Converting results to RIS format.: 92%|█████████▏| 138/150 [00:32<00:07, 1.56it/s]
Converting results to RIS format.: 93%|█████████▎| 139/150 [00:32<00:05, 1.96it/s]
Converting results to RIS format.: 93%|█████████▎| 140/150 [00:32<00:04, 2.44it/s]
Converting results to RIS format.: 94%|█████████▍| 141/150 [00:32<00:03, 2.89it/s]
Converting results to RIS format.: 95%|█████████▍| 142/150 [00:32<00:02, 3.38it/s]
Converting results to RIS format.: 95%|█████████▌| 143/150 [00:33<00:01, 3.72it/s]
Converting results to RIS format.: 96%|█████████▌| 144/150 [00:33<00:01, 4.05it/s]
Converting results to RIS format.: 97%|█████████▋| 145/150 [00:33<00:01, 3.91it/s]
Converting results to RIS format.: 97%|█████████▋| 146/150 [00:33<00:00, 4.12it/s]
Converting results to RIS format.: 98%|█████████▊| 147/150 [00:34<00:00, 4.30it/s]
Converting results to RIS format.: 99%|█████████▊| 148/150 [00:34<00:00, 4.46it/s]
Converting results to RIS format.: 99%|█████████▉| 149/150 [00:34<00:00, 4.32it/s]
Converting results to RIS format.: 100%|██████████| 150/150 [00:34<00:00, 4.58it/s]
Converting results to RIS format.: 100%|██████████| 150/150 [00:34<00:00, 4.32it/s]
And save it to an RIS file:
ris_ds$save_ris("out/snowball_back.ris")
We can also perform backward snowballing with COCI, the OpenCitations Index of Crossref DOI-to-DOI citations.
The syntax is similar to that of Crossref:
search <- paperfetcher$snowballsearch$COCIBackwardReferenceSearch(list("10.1021/acs.jpcb.1c02191", "10.1080/07448481.2022.2059376"))
search()
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:01<00:01, 1.81s/it]
100%|██████████| 2/2 [00:03<00:00, 1.77s/it]
100%|██████████| 2/2 [00:03<00:00, 1.77s/it]
doi_ds <- search$get_DOIDataset()
doi_ds$to_df()
Forward citation chasing involves retrieving all articles which cite a set of starting articles.
Let’s fetch all the citations of two papers with DOIs:
using the COCI service. We cannot use the Crossref service for this task.
The syntax is similar to that of backward search:
search <- paperfetcher$snowballsearch$COCIForwardCitationSearch(list("10.1021/acs.jpcb.1c02191", "10.1080/07448481.2022.2059376"))
search()
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:01<00:01, 1.13s/it]
100%|██████████| 2/2 [00:02<00:00, 1.08s/it]
100%|██████████| 2/2 [00:02<00:00, 1.09s/it]
doi_ds <- search$get_DOIDataset()
doi_ds$to_df()
Again, we can save the search results to a text file:
doi_ds$save_txt("out/snowball_fwd.txt")
Or to an RIS file:
ris_ds <- search$get_RISDataset()
Converting results to RIS format.: 0%| | 0/10 [00:00<?, ?it/s]
Converting results to RIS format.: 10%|█ | 1/10 [00:00<00:02, 3.28it/s]
Converting results to RIS format.: 20%|██ | 2/10 [00:01<00:07, 1.13it/s]
Converting results to RIS format.: 30%|███ | 3/10 [00:01<00:03, 1.75it/s]
Converting results to RIS format.: 40%|████ | 4/10 [00:02<00:02, 2.33it/s]
Converting results to RIS format.: 50%|█████ | 5/10 [00:02<00:01, 2.85it/s]
Converting results to RIS format.: 60%|██████ | 6/10 [00:02<00:01, 3.26it/s]
Converting results to RIS format.: 70%|███████ | 7/10 [00:02<00:00, 3.56it/s]
Converting results to RIS format.: 80%|████████ | 8/10 [00:02<00:00, 3.74it/s]
Converting results to RIS format.: 90%|█████████ | 9/10 [00:03<00:00, 3.91it/s]
Converting results to RIS format.: 100%|██████████| 10/10 [00:03<00:00, 4.30it/s]
Converting results to RIS format.: 100%|██████████| 10/10 [00:03<00:00, 3.02it/s]
ris_ds$save_ris("out/snowball_fwd.ris")