Title: | Import Professional Baseball Data from 'Retrosheet' |
---|---|
Description: | A collection of tools to import and structure the (currently) single-season event, game-log, roster, and schedule data available from <https://www.retrosheet.org>. In particular, the event (a.k.a. play-by-play) files can be especially difficult to parse. This package does the parsing on those files, returning the requested data in the most practical R structure to use for sabermetric or other analyses. |
Authors: | Colin Douglas [aut, cre, cph], Richard Scriven [aut, cph] |
Maintainer: | Colin Douglas <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.6 |
Built: | 2025-01-24 04:40:25 UTC |
Source: | https://github.com/colindouglas/retrosheet |
This function is a wrapper for getRetrosheet(). It downloads and parses data from https://www.retrosheet.org for the game-log, event, (play-by-play), roster, and schedule files. While getRetrosheet() returns a list of matrices, this function returns an equivalent list of dataframes. It takes the same arguments, and can act as a drop-in replacement.
get_retrosheet(...)
get_retrosheet(...)
... |
Arguments passed to 'getRetrosheet()'. 'stringsAsFactors' argument is always FALSE, and will warn if passed as TRUE |
The following return values are possible for the given type
game
- a data frame of gamelog data for the given year
play
- a list, each element of which is a single game's play-by-play
data for the given team and year. Each list element is also a list, containing
the play-by-play data split into individual matrices.
roster
- a named list, each element containing the roster
for the named team for the given year, as a data frame.
schedule
- a data frame containing the game schedule for the given year
## get the full 1995 season schedule get_retrosheet("schedule", 1995) ## get the same schedule, split by time of day get_retrosheet("schedule", 1995, schedSplit = "TimeOfDay") ## get the roster data for the 1995 season, listed by team get_retrosheet("roster", 1995) ## get the full gamelog data for the 2012 season get_retrosheet("game", 2012) ## get the play-by-play data for the San Francisco Giants' 2012 season get_retrosheet("play", 2012, "SFN")
## get the full 1995 season schedule get_retrosheet("schedule", 1995) ## get the same schedule, split by time of day get_retrosheet("schedule", 1995, schedSplit = "TimeOfDay") ## get the roster data for the 1995 season, listed by team get_retrosheet("roster", 1995) ## get the full gamelog data for the 2012 season get_retrosheet("game", 2012) ## get the play-by-play data for the San Francisco Giants' 2012 season get_retrosheet("play", 2012, "SFN")
A convenience function, returning the base file names of the
available downloads for the year
and type
arguments
in getRetrosheet
.
getFileNames()
getFileNames()
A named list of available single-season Retrosheet event and
game-log zip files, and schedule text files. These file names are
not intended to be passed to getRetrosheet
, but is simply a
fast way to determine if the desired data is available.
getFileNames()
getFileNames()
This function returns a two-column data frame of ballpark IDs along with current stadium name
getParkIDs()
getParkIDs()
getParkIDs()
getParkIDs()
Instead of returning the entire file, this function allows the user to choose the columns and date for game-log data.
getPartialGamelog(year, glFields, date = NULL) gamelogFields
getPartialGamelog(year, glFields, date = NULL) gamelogFields
year |
A single four-digit year. |
glFields |
character. The desired game-log columns. This should be a
subset of |
date |
One of either NULL (the default), or a single four-digit character string identifying the date 'mmdd' |
An object of class character
of length 161.
getPartialGamelog
- A data table with dimensions length(date)
x length(glFields)
if date
is not NULL, otherwise the row dimension is the number of games for the given year.
gamelogFields
- A character vector of possible values to choose from for the glFlields
argument in getPartialGamelog
.
## Get Homerun and RBI info for the 2012 season, with park ID f <- grep("HR|RBI|Park", gamelogFields, value = TRUE) getPartialGamelog(2012, glFields = f) ## Get Homerun and RBI info for August 25, 2012 - with park ID getPartialGamelog(glFields=f, date = "20120825")
## Get Homerun and RBI info for the 2012 season, with park ID f <- grep("HR|RBI|Park", gamelogFields, value = TRUE) getPartialGamelog(2012, glFields = f) ## Get Homerun and RBI info for August 25, 2012 - with park ID getPartialGamelog(glFields=f, date = "20120825")
This function downloads and parses data from https://www.retrosheet.org for the game-log, event, (play-by-play), roster, and schedule files.
getRetrosheet( type, year, team, schedSplit = NULL, stringsAsFactors = FALSE, cache = NA )
getRetrosheet( type, year, team, schedSplit = NULL, stringsAsFactors = FALSE, cache = NA )
type |
character. This argument can take on either of "game" for game-logs, "play" for play-by-play (a.k.a. event) data, "roster" for team rosters, or "schedule" for the game schedule for the given year. |
year |
integer. A valid four-digit year. |
team |
character. Only to be used if |
schedSplit |
One of "Date", "HmTeam", or "TimeOfDay" to return a list split by the given value, or NULL (the default) for no splitting. |
stringsAsFactors |
logical. The |
cache |
character. Path to local cache of retrosheet data. If file doesn't exist, files will be saved locally for future use. Defaults to "NA" so as not to save local data without explicit permission |
The following return values are possible for the given type
game
- a data frame of gamelog data for the given year
play
- a list, each element of which is a single game's play-by-play
data for the given team and year. Each list element is also a list, containing
the play-by-play data split into individual matrices.
roster
- a named list, each element containing the roster
for the named team for the given year, as a data frame.
schedule
- a data frame containing the game schedule for the given year
## get the full 1995 season schedule getRetrosheet("schedule", 1995) ## get the same schedule, split by time of day getRetrosheet("schedule", 1995, schedSplit = "TimeOfDay") ## get the roster data for the 1995 season, listed by team getRetrosheet("roster", 1995) ## get the full gamelog data for the 2012 season getRetrosheet("game", 2012) ## get the play-by-play data for the San Francisco Giants' 2012 season getRetrosheet("play", 2012, "SFN")
## get the full 1995 season schedule getRetrosheet("schedule", 1995) ## get the same schedule, split by time of day getRetrosheet("schedule", 1995, schedSplit = "TimeOfDay") ## get the roster data for the 1995 season, listed by team getRetrosheet("roster", 1995) ## get the full gamelog data for the 2012 season getRetrosheet("game", 2012) ## get the play-by-play data for the San Francisco Giants' 2012 season getRetrosheet("play", 2012, "SFN")
This function retrieves the team ID needed for the
team
argument of getRetrosheet("play", year, team)
.
getTeamIDs(year)
getTeamIDs(year)
year |
A single valid four-digit numeric year. |
All currently available years can be retrieved with
type.convert(substr(getFileNames()$event, 1L, 4L))
If the file exists, a named vector of IDs for the given year.
Otherwise NA
.
getTeamIDs(2010)
getTeamIDs(2010)