libwget-robots(3) wget2 libwget-robots(3)

libwget-robots - Robots Exclusion file parser


struct wget_robots_st


int wget_robots_parse (wget_robots **_robots, const char *data, const char *client)
void wget_robots_free (wget_robots **robots)
int wget_robots_get_path_count (wget_robots *robots)
wget_string * wget_robots_get_path (wget_robots *robots, int index)
int wget_robots_get_sitemap_count (wget_robots *robots)
const char * wget_robots_get_sitemap (wget_robots *robots, int index)

The purpose of this set of functions is to parse a Robots Exclusion Standard file into a data structure for easy access.

Parameters

data Memory with robots.txt content (with trailing 0-byte)
client Name of the client / user-agent

Returns

Return an allocated wget_robots structure or NULL on error

The function parses the robots.txt data and returns a ROBOTS structure including a list of the disallowed paths and including a list of the sitemap files.

The ROBOTS structure has to be freed by calling wget_robots_free().

Parameters

robots Pointer to Pointer to wget_robots structure

wget_robots_free() free's the formerly allocated wget_robots structure.

Parameters

robots Pointer to instance of wget_robots

Returns

Returns the number of paths listed in robots

wget_string * wget_robots_get_path (wget_robots * robots, int index)

Parameters

robots Pointer to instance of wget_robots
index Index of the wanted path

Returns

Returns the path at index or NULL

Parameters

robots Pointer to instance of wget_robots

Returns

Returns the number of sitemaps listed in robots

Parameters

robots Pointer to instance of wget_robots
index Index of the wanted sitemap URL

Returns

Returns the sitemap URL at index or NULL

Generated automatically by Doxygen for wget2 from the source code.

Thu Aug 31 2023 Version 2.1.0