Package 'newscatcheR' reference manual

Title:	Programmatically Collect Normalized News from (Almost) Any Website
Description:	Programmatically collect normalized news from (almost) any website. An 'R' clone of the <https://github.com/kotartemiy/newscatcher> 'Python' module.
Authors:	Novica Nakov [aut, cre], Teofil Nakov [ctb], Artem Bugara [ctb], Discindo [cph]
Maintainer:	Novica Nakov <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.2.9000
Built:	2025-03-12 05:15:26 UTC
Source:	https://github.com/discindo/newscatcher

Check URL A helper function to verify user input before fetching the feed.

Description

Check URL A helper function to verify user input before fetching the feed.

Usage

check_url(website = "ycombinator.com", rss_table = package_rss)
check_url(website = "ycombinator.com", rss_table = package_rss)

Arguments

`website`	a url of a new source in the format "news.ycombinator.com"
`rss_table`	a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See 'R/package_rss.R' for details.

Describe URL

Description

Describe URL

Usage

describe_url(website = "ycombinator.com", rss_table = package_rss)
describe_url(website = "ycombinator.com", rss_table = package_rss)

Arguments

`website`	a url of a new source in the format "news.ycombinator.com"
`rss_table`	a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See `package_rss.R` for details.

Value

A character vector with topics.

Examples

describe_url(website = "ycombinator.com", rss_table = package_rss)
describe_url(website = "ycombinator.com", rss_table = package_rss)

Filter URLs in the provided database based on topic, country and language

Description

Filter URLs in the provided database based on topic, country and language

Usage

filter_urls(
  topic = NULL,
  country = NULL,
  language = NULL,
  rss_table = package_rss
)
filter_urls(
  topic = NULL,
  country = NULL,
  language = NULL,
  rss_table = package_rss
)

Arguments

`topic`	the topic of the feed see `show_topics()` for more info.
`country`	the country of origin of the feed using two capital letters, for example "US". See `show_countries()` for more info.
`language`	the language of the content of the feed using two lowercase letters, for example "en". See `show_languages()` for more info.
`rss_table`	a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See `package_rss.R` for details.

Value

a tibble filtered according to the given parameters

Examples

filter_urls(topic = "tech", country = "US", language = "en")
filter_urls(topic = "tech", country = "US", language = "en")

Get headlines A helper function to get just the headlines of the feed

Description

Get headlines A helper function to get just the headlines of the feed

Usage

get_headlines(
  website = "ycombinator.com",
  topic = NULL,
  rss_table = package_rss
)
get_headlines(
  website = "ycombinator.com",
  topic = NULL,
  rss_table = package_rss
)

Arguments

`website`	a url of a new source in the format "news.ycombinator.com"
`topic`	the topic of the feed, by default it is NULL which means it will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world', but not all site have all topics. use `describe_url("website")` to check for available feeds.
`rss_table`	a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See `package_rss` for details.

Value

a tibble containing the headlines contained in the feed

Examples

## Not run: 
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_headlines(website = "ycombinator.com", rss_table = package_rss)

## End(Not run)
## Not run: 
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_headlines(website = "ycombinator.com", rss_table = package_rss)

## End(Not run)

Get news Get the contents of a rss feed

Description

Get news Get the contents of a rss feed

Usage

get_news(website = "ycombinator.com", topic = NULL, rss_table = package_rss)
get_news(website = "ycombinator.com", topic = NULL, rss_table = package_rss)

Arguments

`website`	a url of a new source in the format "news.ycombinator.com"
`topic`	the topic of the feed, by default it is NULL which means it will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world', but not all site have all topics. use `describe_url("website")` to check for available feeds.
`rss_table`	a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See `?package_rss` for details.

Value

a tibble containing the contents of the rss feed

Examples

## Not run: 
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_news(website = "ycombinator.com", rss_table = package_rss)

## End(Not run)
## Not run: 
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_news(website = "ycombinator.com", rss_table = package_rss)

## End(Not run)

RSS table from python package newscatcher

Description

A dataset containing sample medical data.

Usage

package_rss
package_rss

Format

A data frame with 4505 rows and 7 variables:

clean_url: url of news website
language: the language of the website
topic_unified: the topic of the website
main: main
clean_country: clean_country
rss_url: location of feed
GlobalRank: rank of website

Source

https://github.com/kotartemiy/newscatcher

Show countries Show all countries in the database.

Description

Show countries Show all countries in the database.

Usage

show_countries(rss_table = package_rss)
show_countries(rss_table = package_rss)

Arguments

rss_table

a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details.

Value

a character vector of available countries

Show languages Show all languages in the database.

Description

Show languages Show all languages in the database.

Usage

show_languages(rss_table = package_rss)
show_languages(rss_table = package_rss)

Arguments

rss_table

a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database.#' #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details.

Value

a character vector of available languages

Show topics Show all topics in the database.

Description

Show topics Show all topics in the database.

Usage

show_topics(rss_table = package_rss)
show_topics(rss_table = package_rss)

Arguments

rss_table

Value

a character vector of available topics

Package 'newscatcheR'

Help Index

Check URL A helper function to verify user input before fetching the feed.

Description

Usage

Arguments

Describe URL

Description

Usage

Arguments

Value

Examples

Filter URLs in the provided database based on topic, country and language

Description

Usage

Arguments

Value

Examples

Get headlines A helper function to get just the headlines of the feed

Description

Usage

Arguments

Value

Examples

Get news Get the contents of a rss feed

Description

Usage

Arguments

Value

Examples

RSS table from python package newscatcher

Description

Usage

Format

Source

Show countries Show all countries in the database.

Description

Usage

Arguments

Value

Show languages Show all languages in the database.

Description

Usage

Arguments

Value

Show topics Show all topics in the database.

Description

Usage

Arguments

Value