Title: | Programmatically Collect Normalized News from (Almost) Any Website |
---|---|
Description: | Programmatically collect normalized news from (almost) any website. An 'R' clone of the <https://github.com/kotartemiy/newscatcher> 'Python' module. |
Authors: | Novica Nakov [aut, cre], Teofil Nakov [ctb], Artem Bugara [ctb], Discindo [cph] |
Maintainer: | Novica Nakov <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2.9000 |
Built: | 2024-11-12 05:42:06 UTC |
Source: | https://github.com/discindo/newscatcher |
Check URL A helper function to verify user input before fetching the feed.
check_url(website = "ycombinator.com", rss_table = package_rss)
check_url(website = "ycombinator.com", rss_table = package_rss)
website |
a url of a new source in the format "news.ycombinator.com" |
rss_table |
a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See 'R/package_rss.R' for details. |
Describe URL
describe_url(website = "ycombinator.com", rss_table = package_rss)
describe_url(website = "ycombinator.com", rss_table = package_rss)
website |
a url of a new source in the format "news.ycombinator.com" |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
A character vector with topics.
describe_url(website = "ycombinator.com", rss_table = package_rss)
describe_url(website = "ycombinator.com", rss_table = package_rss)
Filter URLs in the provided database based on topic, country and language
filter_urls( topic = NULL, country = NULL, language = NULL, rss_table = package_rss )
filter_urls( topic = NULL, country = NULL, language = NULL, rss_table = package_rss )
topic |
the topic of the feed see |
country |
the country of origin of the feed using two capital
letters, for example "US". See |
language |
the language of the content of the feed using two
lowercase letters, for example "en". See |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
a tibble filtered according to the given parameters
filter_urls(topic = "tech", country = "US", language = "en")
filter_urls(topic = "tech", country = "US", language = "en")
Get headlines A helper function to get just the headlines of the feed
get_headlines( website = "ycombinator.com", topic = NULL, rss_table = package_rss )
get_headlines( website = "ycombinator.com", topic = NULL, rss_table = package_rss )
website |
a url of a new source in the format "news.ycombinator.com" |
topic |
the topic of the feed, by default it is NULL which means it
will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science',
'finance', 'food', 'politics', 'economics', 'travel', 'entertainment',
'music', 'sport', 'world', but not all site have all topics.
use |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
a tibble containing the headlines contained in the feed
## Not run: Sys.sleep(3) # adding a small time delay to avoid # simultaneous posts to the API get_headlines(website = "ycombinator.com", rss_table = package_rss) ## End(Not run)
## Not run: Sys.sleep(3) # adding a small time delay to avoid # simultaneous posts to the API get_headlines(website = "ycombinator.com", rss_table = package_rss) ## End(Not run)
Get news Get the contents of a rss feed
get_news(website = "ycombinator.com", topic = NULL, rss_table = package_rss)
get_news(website = "ycombinator.com", topic = NULL, rss_table = package_rss)
website |
a url of a new source in the format "news.ycombinator.com" |
topic |
the topic of the feed, by default it is NULL which means it
will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science',
'finance', 'food', 'politics', 'economics', 'travel', 'entertainment',
'music', 'sport', 'world', but not all site have all topics.
use |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
a tibble containing the contents of the rss feed
## Not run: Sys.sleep(3) # adding a small time delay to avoid # simultaneous posts to the API get_news(website = "ycombinator.com", rss_table = package_rss) ## End(Not run)
## Not run: Sys.sleep(3) # adding a small time delay to avoid # simultaneous posts to the API get_news(website = "ycombinator.com", rss_table = package_rss) ## End(Not run)
A dataset containing sample medical data.
package_rss
package_rss
A data frame with 4505 rows and 7 variables:
url of news website
the language of the website
the topic of the website
main
clean_country
location of feed
rank of website
https://github.com/kotartemiy/newscatcher
Show countries Show all countries in the database.
show_countries(rss_table = package_rss)
show_countries(rss_table = package_rss)
rss_table |
a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details. |
a character vector of available countries
Show languages Show all languages in the database.
show_languages(rss_table = package_rss)
show_languages(rss_table = package_rss)
rss_table |
a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database.#' #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details. |
a character vector of available languages
Show topics Show all topics in the database.
show_topics(rss_table = package_rss)
show_topics(rss_table = package_rss)
rss_table |
a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details. |
a character vector of available topics