Package 'newscatcheR'

Title: Programmatically Collect Normalized News from (Almost) Any Website
Description: Programmatically collect normalized news from (almost) any website. An 'R' clone of the <https://github.com/kotartemiy/newscatcher> 'Python' module.
Authors: Novica Nakov [aut, cre], Teofil Nakov [ctb], Artem Bugara [ctb], Discindo [cph]
Maintainer: Novica Nakov <[email protected]>
License: MIT + file LICENSE
Version: 0.1.2.9000
Built: 2024-11-12 05:42:06 UTC
Source: https://github.com/discindo/newscatcher

Help Index


Check URL A helper function to verify user input before fetching the feed.

Description

Check URL A helper function to verify user input before fetching the feed.

Usage

check_url(website = "ycombinator.com", rss_table = package_rss)

Arguments

website

a url of a new source in the format "news.ycombinator.com"

rss_table

a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See 'R/package_rss.R' for details.


Describe URL

Description

Describe URL

Usage

describe_url(website = "ycombinator.com", rss_table = package_rss)

Arguments

website

a url of a new source in the format "news.ycombinator.com"

rss_table

a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See package_rss.R for details.

Value

A character vector with topics.

Examples

describe_url(website = "ycombinator.com", rss_table = package_rss)

Filter URLs in the provided database based on topic, country and language

Description

Filter URLs in the provided database based on topic, country and language

Usage

filter_urls(
  topic = NULL,
  country = NULL,
  language = NULL,
  rss_table = package_rss
)

Arguments

topic

the topic of the feed see show_topics() for more info.

country

the country of origin of the feed using two capital letters, for example "US". See show_countries() for more info.

language

the language of the content of the feed using two lowercase letters, for example "en". See show_languages() for more info.

rss_table

a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See package_rss.R for details.

Value

a tibble filtered according to the given parameters

Examples

filter_urls(topic = "tech", country = "US", language = "en")

Get headlines A helper function to get just the headlines of the feed

Description

Get headlines A helper function to get just the headlines of the feed

Usage

get_headlines(
  website = "ycombinator.com",
  topic = NULL,
  rss_table = package_rss
)

Arguments

website

a url of a new source in the format "news.ycombinator.com"

topic

the topic of the feed, by default it is NULL which means it will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world', but not all site have all topics. use describe_url("website") to check for available feeds.

rss_table

a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See package_rss for details.

Value

a tibble containing the headlines contained in the feed

Examples

## Not run: 
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_headlines(website = "ycombinator.com", rss_table = package_rss)

## End(Not run)

Get news Get the contents of a rss feed

Description

Get news Get the contents of a rss feed

Usage

get_news(website = "ycombinator.com", topic = NULL, rss_table = package_rss)

Arguments

website

a url of a new source in the format "news.ycombinator.com"

topic

the topic of the feed, by default it is NULL which means it will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world', but not all site have all topics. use describe_url("website") to check for available feeds.

rss_table

a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See ?package_rss for details.

Value

a tibble containing the contents of the rss feed

Examples

## Not run: 
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_news(website = "ycombinator.com", rss_table = package_rss)

## End(Not run)

RSS table from python package newscatcher

Description

A dataset containing sample medical data.

Usage

package_rss

Format

A data frame with 4505 rows and 7 variables:

clean_url

url of news website

language

the language of the website

topic_unified

the topic of the website

main

main

clean_country

clean_country

rss_url

location of feed

GlobalRank

rank of website

Source

https://github.com/kotartemiy/newscatcher


Show countries Show all countries in the database.

Description

Show countries Show all countries in the database.

Usage

show_countries(rss_table = package_rss)

Arguments

rss_table

a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details.

Value

a character vector of available countries


Show languages Show all languages in the database.

Description

Show languages Show all languages in the database.

Usage

show_languages(rss_table = package_rss)

Arguments

rss_table

a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database.#' #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details.

Value

a character vector of available languages


Show topics Show all topics in the database.

Description

Show topics Show all topics in the database.

Usage

show_topics(rss_table = package_rss)

Arguments

rss_table

a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details.

Value

a character vector of available topics