Test crawling a URL - Algolia docs

POST

crawlers

{id}

test

object

url

string

config

object

actions

array

apiKey

string

appId

string

exclusionPatterns

array

externalData

array

extraUrls

array

ignoreCanonicalTo

boolean

ignoreNoFollowTo

boolean

ignoreNoIndex

boolean

ignoreQueryParams

array

ignoreRobotsTxtRules

boolean

indexPrefix

string

initialIndexSettings

object

linkExtractor

object

__type

enum<string>

source

string

Option 1 · object

url

string

requestOptions

object

method

string

headers

object

body

string

timeout

number

maxDepth

number

maxUrls

number

rateLimit

number

renderJavaScript

boolean

requestOptions

object

proxy

string

timeout

number

retries

number

headers

object

Accept-Language

string

Authorization

string

safetyChecks

object

beforeIndexPublishing

object

maxLostRecordsPercentage

number

maxFailedUrls

number

saveBackup

boolean

schedule

string

sitemaps

array

startUrls

array

Authorizations

Authorization

string

headerrequired

Basic authentication header of the form Basic <encoded-value>, where <encoded-value> is the base64-encoded string username:password.

Path Parameters

string

required

Crawler ID.

Body

application/json

url

string

required

URL to test.

config

object

Crawler configuration to update. You can only update top-level configuration properties. To update a nested configuration, such as actions.recordExtractor, you must provide the complete top-level object such as actions.

config.actions

object[]

required

Instructions about how to process crawled URLs.

Each action defines:

The targeted subset of URLs it processes.
What information to extract from the web pages.
The Algolia indices where the extracted records will be stored.

A single web page can match multiple actions. In this case, the crawler produces one record for each matched action.

config.actions.autoGenerateObjectIDs

boolean

default: true

Whether to generate objectID properties for each extracted record.

If false, you must manually add objectID properties to the extracted records.

config.actions.cache

object

Whether the crawler should cache crawled pages.

With caching, the crawler only crawls changed pages. To detect changed pages, the crawler makes HTTP conditional requests to your pages. The crawler uses the ETag and Last-Modified response headers returned by your web server during the previous crawl. The crawler sends this information in the If-None-Match and If-Modified-Since request headers.

If your web server responds with 304 Not Modified to the conditional request, the crawler reuses the records from the previous crawl.

Caching is ignored in these cases:

If your crawler configuration changed between two crawls.
If externalData changed between two crawls.

config.actions.discoveryPatterns

string[]

Patterns for additional pages to visit to find links without extracting records.

The crawler looks for matching pages and crawls them for links, but doesn't extract records from the (intermediate) pages themselves.

config.actions.fileTypesToMatch

enum<string>[]

File types for crawling non-HTML documents.

Non-HTML documents are first converted to HTML by an Apache Tika server.

Crawling non-HTML documents has the following limitations:

It's slower than crawling HTML documents.
PDFs must include the used fonts.
The produced HTML pages might not be semantic. This makes achieving good relevance more difficult.
Natural language detection isn't supported.
Extracted metadata might vary between files produced by different programs and versions.

Available options:

doc,

email,

html,

odp,

ods,

odt,

pdf,

ppt,

xls

config.actions.hostnameAliases

object

Key-value pairs to replace matching hostnames found in a sitemap, on a page, in canonical links, or redirects.

The crawler continues from the transformed URLs. The mapping doesn't transform URLs listed in the startUrls, siteMaps, pathsToMatch, and other settings. The mapping also doesn't replace hostnames found in extracted text.

config.actions.indexName

string

required

Index name where to store the extracted records from this action. The name is combined with the prefix you specified in the indexPrefix option.

Maximum length: 256

config.actions.name

string

Unique identifier for the action. This option is required if schedule is set.

config.actions.pathAliases

object

Key-value pairs to replace matching paths with new values.

The crawl continues from the transformed URLs. The mapping doesn't transform URLs listed in the startUrls, siteMaps, pathsToMatch, and other settings. The mapping also doesn't replace paths found in extracted text.

config.actions.pathsToMatch

string[]

Patterns for URLs to which this action should apply.

config.actions.recordExtractor

object

required

Function for extracting information from a crawled page and transforming it into Algolia records for indexing.

config.actions.selectorsToMatch

string[]

DOM selectors for nodes that must be present on the page to be processed. If the page doesn't match any of the selectors, it's ignored.

config.appId

string

required

Algolia application ID where the crawler creates and updates indices. The Crawler add-on must be enabled for this application.

config.rateLimit

number

required

Number of concurrent tasks per second.

If processing each URL takes n seconds, your crawler can process rateLimit / n URLs per second.

Higher numbers mean faster crawls but they also increase your bandwidth and server load.

Required range: 1 < x < 100

config.apiKey

string

Algolia API key for indexing the records.

The API key must have the following access control list (ACL) permissions: search, browse, listIndexes, addObject, deleteObject, deleteIndex, settings, editSettings. The API key must not be the admin API key of the application. The API key must have access to create the indices that the crawler will use. For example, if indexPrefix is crawler_, the API key must have access to all crawler_* indices.

config.exclusionPatterns

string[]

URLs to exclude from crawling.

config.externalData

string[]

References to external data sources for enriching the extracted records.

For more information, see Enrich extracted records with external data.

config.extraUrls

string[]

URLs from where to start crawling.

These are the same as startUrls. URLs you crawl manually can be added to extraUrls.

config.ignoreCanonicalTo

Whether to ignore canonical redirects.

If true, canonical URLs for pages are ignored.

config.ignoreNoFollowTo

boolean

Whether to ignore the nofollow meta tag or link attribute. If true, links with the rel="nofollow" attribute or links on pages with the nofollow robots meta tag will be crawled.

config.ignoreNoIndex

boolean

Whether to ignore the noindex robots meta tag. If true, pages with this meta tag will be crawled.

config.ignoreQueryParams

string[]

Query parameters to ignore while crawling.

All URLs with the matching query parameters will be treated as identical. This prevents indexing duplicated URLs, that just differ by their query parameters.

config.ignoreRobotsTxtRules

boolean

Whether to ignore rules defined in your robots.txt file.

config.indexPrefix

string

A prefix for all indices created by this crawler. It's combined with the indexName for each action to form the complete index name.

Maximum length: 64

config.initialIndexSettings

object

Initial index settings, one settings object per index.

This setting is only applied when the index is first created. Settings are not re-applied. This prevents overriding any settings changes after the index was created.

config.initialIndexSettings.{key}

object

Index settings.

config.initialIndexSettings.{key}.attributesForFaceting

string[]

Attributes used for faceting.

Facets are attributes that let you categorize search results. They can be used for filtering search results. By default, no attribute is used for faceting. Attribute names are case-sensitive.

Modifiers

filterOnly("ATTRIBUTE"). Allows the attribute to be used as a filter but doesn't evaluate the facet values.
searchable("ATTRIBUTE"). Allows searching for facet values.
afterDistinct("ATTRIBUTE"). Evaluates the facet count after deduplication with distinct. This ensures accurate facet counts. You can apply this modifier to searchable facets: afterDistinct(searchable(ATTRIBUTE)).

config.initialIndexSettings.{key}.replicas

string[]

Creates replica indices.

Replicas are copies of a primary index with the same records but different settings, synonyms, or rules. If you want to offer a different ranking or sorting of your search results, you'll use replica indices. All index operations on a primary index are automatically forwarded to its replicas. To add a replica index, you must provide the complete set of replicas to this parameter. If you omit a replica from this list, the replica turns into a regular, standalone index that will no longer be synced with the primary index.

Modifier

virtual("REPLICA"). Create a virtual replica, Virtual replicas don't increase the number of records and are optimized for Relevant sorting.

config.initialIndexSettings.{key}.paginationLimitedTo

integer

default: 1000

Maximum number of search results that can be obtained through pagination.

Higher pagination limits might slow down your search. For pagination limits above 1,000, the sorting of results beyond the 1,000th hit can't be guaranteed.

Required range: x < 20000

config.initialIndexSettings.{key}.unretrievableAttributes

string[]

Attributes that can't be retrieved at query time.

This can be useful if you want to use an attribute for ranking or to restrict access, but don't want to include it in the search results. Attribute names are case-sensitive.

config.initialIndexSettings.{key}.disableTypoToleranceOnWords

string[]

Creates a list of words which require exact matches. This also turns off word splitting and concatenation for the specified words.

config.initialIndexSettings.{key}.attributesToTransliterate

string[]

Attributes, for which you want to support Japanese transliteration.

Transliteration supports searching in any of the Japanese writing systems. To support transliteration, you must set the indexing language to Japanese. Attribute names are case-sensitive.

config.initialIndexSettings.{key}.camelCaseAttributes

string[]

Attributes for which to split camel case words. Attribute names are case-sensitive.

config.initialIndexSettings.{key}.decompoundedAttributes

object

Searchable attributes to which Algolia should apply word segmentation (decompounding). Attribute names are case-sensitive.

Compound words are formed by combining two or more individual words, and are particularly prevalent in Germanic languages—for example, "firefighter". With decompounding, the individual components are indexed separately.

You can specify different lists for different languages. Decompounding is supported for these languages: Dutch (nl), German (de), Finnish (fi), Danish (da), Swedish (sv), and Norwegian (no). Decompounding doesn't work for words with non-spacing mark Unicode characters. For example, Gartenstühle won't be decompounded if the ü consists of u (U+0075) and ◌̈ (U+0308).

config.initialIndexSettings.{key}.indexLanguages

enum<string>[]

Languages for language-specific processing steps, such as word detection and dictionary settings.

You should always specify an indexing language. If you don't specify an indexing language, the search engine uses all supported languages, or the languages you specified with the ignorePlurals or removeStopWords parameters. This can lead to unexpected search results. For more information, see Language-specific configuration.

Available options:

af,

ar,

az,

bg,

bn,

ca,

cs,

cy,

da,

de,

el,

en,

eo,

es,

et,

eu,

fa,

fi,

fo,

fr,

ga,

gl,

he,

hi,

hu,

hy,

id,

is,

it,

ja,

ka,

kk,

ko,

ku,

ky,

lt,

lv,

mi,

mn,

mr,

ms,

mt,

nb,

nl,

no,

ns,

pl,

ps,

pt,

pt-br,

qu,

ro,

ru,

sk,

sq,

sv,

sw,

ta,

te,

th,

tl,

tn,

tr,

tt,

uk,

ur,

uz,

zh

config.initialIndexSettings.{key}.disablePrefixOnAttributes

string[]

Searchable attributes for which you want to turn off prefix matching. Attribute names are case-sensitive.

config.initialIndexSettings.{key}.allowCompressionOfIntegerArray

boolean

default: false

Whether arrays with exclusively non-negative integers should be compressed for better performance. If true, the compressed arrays may be reordered.

config.initialIndexSettings.{key}.numericAttributesForFiltering

string[]

Numeric attributes that can be used as numerical filters. Attribute names are case-sensitive.

By default, all numeric attributes are available as numerical filters. For faster indexing, reduce the number of numeric attributes.

To turn off filtering for all numeric attributes, specify an attribute that doesn't exist in your index, such as NO_NUMERIC_FILTERING.

Modifier

equalOnly("ATTRIBUTE"). Support only filtering based on equality comparisons = and !=.

config.initialIndexSettings.{key}.separatorsToIndex

string

default:

Control which non-alphanumeric characters are indexed.

By default, Algolia ignores non-alphanumeric characters like hyphen (-), plus (+), and parentheses ((,)). To include such characters, define them with separatorsToIndex.

Separators are all non-letter characters except spaces and currency characters, such as $€£¥.

With separatorsToIndex, Algolia treats separator characters as separate words. For example, in a search for "Disney+", Algolia considers "Disney" and "+" as two separate words.

config.initialIndexSettings.{key}.searchableAttributes

string[]

Attributes used for searching. Attribute names are case-sensitive.

By default, all attributes are searchable and the Attribute ranking criterion is turned off. With a non-empty list, Algolia only returns results with matches in the selected attributes. In addition, the Attribute ranking criterion is turned on: matches in attributes that are higher in the list of searchableAttributes rank first. To make matches in two attributes rank equally, include them in a comma-separated string, such as "title,alternate_title". Attributes with the same priority are always unordered.

For more information, see Searchable attributes.

Modifier

unordered("ATTRIBUTE"). Ignore the position of a match within the attribute.

Without a modifier, matches at the beginning of an attribute rank higher than matches at the end.

config.initialIndexSettings.{key}.userData

object

An object with custom data.

You can store up to 32kB as custom data.

config.initialIndexSettings.{key}.customNormalization

object

Characters and their normalized replacements. This overrides Algolia's default normalization.

config.initialIndexSettings.{key}.attributeForDistinct

string

Attribute that should be used to establish groups of results. Attribute names are case-sensitive.

All records with the same value for this attribute are considered a group. You can combine attributeForDistinct with the distinct search parameter to control how many items per group are included in the search results.

If you want to use the same attribute also for faceting, use the afterDistinct modifier of the attributesForFaceting setting. This applies faceting after deduplication, which will result in accurate facet counts.

config.initialIndexSettings.{key}.maxFacetHits

integer

default: 10

Maximum number of facet values to return when searching for facet values.

Required range: x < 100

config.initialIndexSettings.{key}.attributesToRetrieve

string[]

Attributes to include in the API response.

To reduce the size of your response, you can retrieve only some of the attributes. Attribute names are case-sensitive.

* retrieves all attributes, except attributes included in the customRanking and unretrievableAttributes settings.
To retrieve all attributes except a specific one, prefix the attribute with a dash and combine it with the *: ["*", "-ATTRIBUTE"].
The objectID attribute is always included.

config.initialIndexSettings.{key}.ranking

string[]

Determines the order in which Algolia returns your results.

By default, each entry corresponds to a ranking criteria. The tie-breaking algorithm sequentially applies each criterion in the order they're specified. If you configure a replica index for sorting by an attribute, you put the sorting attribute at the top of the list.

Modifiers

asc("ATTRIBUTE"). Sort the index by the values of an attribute, in ascending order.
desc("ATTRIBUTE"). Sort the index by the values of an attribute, in descending order.

Before you modify the default setting, you should test your changes in the dashboard, and by A/B testing.

config.initialIndexSettings.{key}.customRanking

string[]

Attributes to use as custom ranking. Attribute names are case-sensitive.

The custom ranking attributes decide which items are shown first if the other ranking criteria are equal.

Records with missing values for your selected custom ranking attributes are always sorted last. Boolean attributes are sorted based on their alphabetical order.

Modifiers

asc("ATTRIBUTE"). Sort the index by the values of an attribute, in ascending order.
desc("ATTRIBUTE"). Sort the index by the values of an attribute, in descending order.

If you use two or more custom ranking attributes, reduce the precision of your first attributes, or the other attributes will never be applied.

config.initialIndexSettings.{key}.relevancyStrictness

integer

default: 100

Relevancy threshold below which less relevant results aren't included in the results.

You can only set relevancyStrictness on virtual replica indices. Use this setting to strike a balance between the relevance and number of returned results.

config.initialIndexSettings.{key}.attributesToHighlight

string[]

Attributes to highlight.

By default, all searchable attributes are highlighted. Use * to highlight all attributes or use an empty array [] to turn off highlighting. Attribute names are case-sensitive.

With highlighting, strings that match the search query are surrounded by HTML tags defined by highlightPreTag and highlightPostTag. You can use this to visually highlight matching parts of a search query in your UI.

For more information, see Highlighting and snippeting.

config.initialIndexSettings.{key}.attributesToSnippet

string[]

Attributes for which to enable snippets. Attribute names are case-sensitive.

Snippets provide additional context to matched words. If you enable snippets, they include 10 words, including the matched word. The matched word will also be wrapped by HTML tags for highlighting. You can adjust the number of words with the following notation: ATTRIBUTE:NUMBER, where NUMBER is the number of words to be extracted.

config.initialIndexSettings.{key}.highlightPreTag

string

default: <em>

HTML tag to insert before the highlighted parts in all highlighted results and snippets.

config.initialIndexSettings.{key}.highlightPostTag

string

default: </em>

HTML tag to insert after the highlighted parts in all highlighted results and snippets.

config.initialIndexSettings.{key}.snippetEllipsisText

string

default: …

String used as an ellipsis indicator when a snippet is truncated.

config.initialIndexSettings.{key}.restrictHighlightAndSnippetArrays

boolean

default: false

Whether to restrict highlighting and snippeting to items that at least partially matched the search query. By default, all items are highlighted and snippeted.

config.initialIndexSettings.{key}.hitsPerPage

integer

default: 20

Number of hits per page.

Required range: 1 < x < 1000

config.initialIndexSettings.{key}.minWordSizefor1Typo

integer

default: 4

Minimum number of characters a word in the search query must contain to accept matches with one typo.

config.initialIndexSettings.{key}.minWordSizefor2Typos

integer

default: 8

Minimum number of characters a word in the search query must contain to accept matches with two typos.

config.initialIndexSettings.{key}.typoTolerance

default: true

Whether typo tolerance is enabled and how it is applied.

If typo tolerance is true, min, or strict, word splitting and concatenation are also active.

config.initialIndexSettings.{key}.allowTyposOnNumericTokens

boolean

default: true

Whether to allow typos on numbers in the search query.

Turn off this setting to reduce the number of irrelevant matches when searching in large sets of similar numbers.

config.initialIndexSettings.{key}.disableTypoToleranceOnAttributes

string[]

Attributes for which you want to turn off typo tolerance. Attribute names are case-sensitive.

Returning only exact matches can help when:

Searching in hyphenated attributes.
Reducing the number of matches when you have too many. This can happen with attributes that are long blocks of text, such as product descriptions.

Consider alternatives such as disableTypoToleranceOnWords or adding synonyms if your attributes have intentional unusual spellings that might look like typos.

config.initialIndexSettings.{key}.ignorePlurals

Treat singular, plurals, and other forms of declensions as equivalent. You should only use this feature for the languages used in your index.

Available options:

af,

ar,

az,

bg,

bn,

ca,

cs,

cy,

da,

de,

el,

en,

eo,

es,

et,

eu,

fa,

fi,

fo,

fr,

ga,

gl,

he,

hi,

hu,

hy,

id,

is,

it,

ja,

ka,

kk,

ko,

ku,

ky,

lt,

lv,

mi,

mn,

mr,

ms,

mt,

nb,

nl,

no,

ns,

pl,

ps,

pt,

pt-br,

qu,

ro,

ru,

sk,

sq,

sv,

sw,

ta,

te,

th,

tl,

tn,

tr,

tt,

uk,

ur,

uz,

zh

config.initialIndexSettings.{key}.removeStopWords

Removes stop words from the search query.

Stop words are common words like articles, conjunctions, prepositions, or pronouns that have little or no meaning on their own. In English, "the", "a", or "and" are stop words.

You should only use this feature for the languages used in your index.

Available options:

af,

ar,

az,

bg,

bn,

ca,

cs,

cy,

da,

de,

el,

en,

eo,

es,

et,

eu,

fa,

fi,

fo,

fr,

ga,

gl,

he,

hi,

hu,

hy,

id,

is,

it,

ja,

ka,

kk,

ko,

ku,

ky,

lt,

lv,

mi,

mn,

mr,

ms,

mt,

nb,

nl,

no,

ns,

pl,

ps,

pt,

pt-br,

qu,

ro,

ru,

sk,

sq,

sv,

sw,

ta,

te,

th,

tl,

tn,

tr,

tt,

uk,

ur,

uz,

zh

config.initialIndexSettings.{key}.keepDiacriticsOnCharacters

string

default:

Characters for which diacritics should be preserved.

By default, Algolia removes diacritics from letters. For example, é becomes e. If this causes issues in your search, you can specify characters that should keep their diacritics.

config.initialIndexSettings.{key}.queryLanguages

enum<string>[]

Languages for language-specific query processing steps such as plurals, stop-word removal, and word-detection dictionaries.

This setting sets a default list of languages used by the removeStopWords and ignorePlurals settings. This setting also sets a dictionary for word detection in the logogram-based CJK languages. To support this, you must place the CJK language first.

You should always specify a query language. If you don't specify an indexing language, the search engine uses all supported languages, or the languages you specified with the ignorePlurals or removeStopWords parameters. This can lead to unexpected search results. For more information, see Language-specific configuration.

Available options:

af,

ar,

az,

bg,

bn,

ca,

cs,

cy,

da,

de,

el,

en,

eo,

es,

et,

eu,

fa,

fi,

fo,

fr,

ga,

gl,

he,

hi,

hu,

hy,

id,

is,

it,

ja,

ka,

kk,

ko,

ku,

ky,

lt,

lv,

mi,

mn,

mr,

ms,

mt,

nb,

nl,

no,

ns,

pl,

ps,

pt,

pt-br,

qu,

ro,

ru,

sk,

sq,

sv,

sw,

ta,

te,

th,

tl,

tn,

tr,

tt,

uk,

ur,

uz,

zh

config.initialIndexSettings.{key}.decompoundQuery

boolean

default: true

Whether to split compound words in the query into their building blocks.

For more information, see Word segmentation. Word segmentation is supported for these languages: German, Dutch, Finnish, Swedish, and Norwegian. Decompounding doesn't work for words with non-spacing mark Unicode characters. For example, Gartenstühle won't be decompounded if the ü consists of u (U+0075) and ◌̈ (U+0308).

config.initialIndexSettings.{key}.enableRules

boolean

default: true

Whether to enable rules.

config.initialIndexSettings.{key}.enablePersonalization

boolean

default: false

Whether to enable Personalization.

config.initialIndexSettings.{key}.queryType

enum<string>

default: prefixLast

Determines if and how query words are interpreted as prefixes.

By default, only the last query word is treated as a prefix (prefixLast). To turn off prefix search, use prefixNone. Avoid prefixAll, which treats all query words as prefixes. This might lead to counterintuitive results and makes your search slower.

For more information, see Prefix searching.

Available options:

prefixLast,

prefixAll,

prefixNone

config.initialIndexSettings.{key}.removeWordsIfNoResults

enum<string>

default: none

Strategy for removing words from the query when it doesn't return any results. This helps to avoid returning empty search results.

none. No words are removed when a query doesn't return results.
lastWords. Treat the last (then second to last, then third to last) word as optional, until there are results or at most 5 words have been removed.
firstWords. Treat the first (then second, then third) word as optional, until there are results or at most 5 words have been removed.
allOptional. Treat all words as optional.

For more information, see Remove words to improve results.

Available options:

none,

lastWords,

firstWords,

allOptional

config.initialIndexSettings.{key}.mode

enum<string>

default: keywordSearch

Search mode the index will use to query for results.

This setting only applies to indices, for which Algolia enabled NeuralSearch for you.

Available options:

neuralSearch,

keywordSearch

config.initialIndexSettings.{key}.semanticSearch

object

Settings for the semantic search part of NeuralSearch. Only used when mode is neuralSearch.

config.initialIndexSettings.{key}.advancedSyntax

boolean

default: false

Whether to support phrase matching and excluding words from search queries.

Use the advancedSyntaxFeatures parameter to control which feature is supported.

config.initialIndexSettings.{key}.optionalWords

A string, null, or an array reference to optional words.

config.initialIndexSettings.{key}.disableExactOnAttributes

string[]

Searchable attributes for which you want to turn off the Exact ranking criterion. Attribute names are case-sensitive.

This can be useful for attributes with long values, where the likelihood of an exact match is high, such as product descriptions. Turning off the Exact ranking criterion for these attributes favors exact matching on other attributes. This reduces the impact of individual attributes with a lot of content on ranking.

config.initialIndexSettings.{key}.exactOnSingleWordQuery

enum<string>

default: attribute

Determines how the Exact ranking criterion is computed when the search query has only one word.

attribute. The Exact ranking criterion is 1 if the query word and attribute value are the same. For example, a search for "road" will match the value "road", but not "road trip".
none. The Exact ranking criterion is ignored on single-word searches.
word. The Exact ranking criterion is 1 if the query word is found in the attribute value. The query word must have at least 3 characters and must not be a stop word. Only exact matches will be highlighted, partial and prefix matches won't.

Available options:

attribute,

none,

word

config.initialIndexSettings.{key}.alternativesAsExact

enum<string>[]

Determine which plurals and synonyms should be considered an exact matches.

By default, Algolia treats singular and plural forms of a word, and single-word synonyms, as exact matches when searching. For example:

"swimsuit" and "swimsuits" are treated the same
"swimsuit" and "swimwear" are treated the same (if they are synonyms).
ignorePlurals. Plurals and similar declensions added by the ignorePlurals setting are considered exact matches.
singleWordSynonym. Single-word synonyms, such as "NY" = "NYC", are considered exact matches.
multiWordsSynonym. Multi-word synonyms, such as "NY" = "New York", are considered exact matches.

Available options:

ignorePlurals,

singleWordSynonym,

multiWordsSynonym

config.initialIndexSettings.{key}.advancedSyntaxFeatures

enum<string>[]

Advanced search syntax features you want to support.

exactPhrase. Phrases in quotes must match exactly. For example, sparkly blue "iPhone case" only returns records with the exact string "iPhone case".
excludeWords. Query words prefixed with a - must not occur in a record. For example, search -engine matches records that contain "search" but not "engine".

This setting only has an effect if advancedSyntax is true.

Available options:

exactPhrase,

excludeWords

config.initialIndexSettings.{key}.distinct

Determines how many records of a group are included in the search results.

Records with the same value for the attributeForDistinct attribute are considered a group. The distinct setting controls how many members of the group are returned. This is useful for deduplication and grouping.

The distinct setting is ignored if attributeForDistinct is not set.

config.initialIndexSettings.{key}.replaceSynonymsInHighlight

boolean

default: false

Whether to replace a highlighted word with the matched synonym.

By default, the original words are highlighted even if a synonym matches. For example, with home as a synonym for house and a search for home, records matching either "home" or "house" are included in the search results, and either "home" or "house" are highlighted.

With replaceSynonymsInHighlight set to true, a search for home still matches the same records, but all occurrences of "house" are replaced by "home" in the highlighted response.

config.initialIndexSettings.{key}.minProximity

integer

default: 1

Minimum proximity score for two matching words.

This adjusts the Proximity ranking criterion by equally scoring matches that are farther apart.

For example, if minProximity is 2, neighboring matches and matches with one word between them would have the same score.

Required range: 1 < x < 7

config.initialIndexSettings.{key}.responseFields

string[]

Properties to include in the API response of search and browse requests.

By default, all response properties are included. To reduce the response size, you can select, which attributes should be included.

You can't exclude these properties: message, warning, cursor, serverUsed, indexUsed, abTestVariantID, parsedQuery, or any property triggered by the getRankingInfo parameter.

Don't exclude properties that you might need in your search UI.

config.initialIndexSettings.{key}.maxValuesPerFacet

integer

default: 100

Maximum number of facet values to return for each facet.

Required range: x < 1000

config.initialIndexSettings.{key}.sortFacetValuesBy

string

default: count

Order in which to retrieve facet values.

count. Facet values are retrieved by decreasing count. The count is the number of matching records containing this facet value.
alpha. Retrieve facet values alphabetically.

This setting doesn't influence how facet values are displayed in your UI (see renderingContent). For more information, see facet value display.

config.initialIndexSettings.{key}.attributeCriteriaComputedByMinProximity

boolean

default: false

Whether the best matching attribute should be determined by minimum proximity.

This setting only affects ranking if the Attribute ranking criterion comes before Proximity in the ranking setting. If true, the best matching attribute is selected based on the minimum proximity of multiple matches. Otherwise, the best matching attribute is determined by the order in the searchableAttributes setting.

config.initialIndexSettings.{key}.renderingContent

object

Extra data that can be used in the search UI.

You can use this to control aspects of your search UI, such as the order of facet names and values without changing your frontend code.

config.initialIndexSettings.{key}.enableReRanking

boolean

default: true

Whether this search will use Dynamic Re-Ranking.

This setting only has an effect if you activated Dynamic Re-Ranking for this index in the Algolia dashboard.

config.initialIndexSettings.{key}.reRankingApplyFilter

Filter applied during the re-ranking process.

If null, no filter is applied.

config.linkExtractor

object

Function for extracting URLs for links found on crawled pages.

Authorization method and credentials for crawling protected content.

config.maxDepth

number

Maximum path depth of crawled URLs. For example, if maxDepth is 2, https://example.com/foo/bar is crawled, but https://example.com/foo/bar/baz won't. Trailing slashes increase the URL depth.

Required range: 1 < x < 100

config.maxUrls

number

Maximum number of crawled URLs.

Setting maxUrls doesn't guarantee consistency between crawls because the crawler processes URLs in parallel.

Required range: 1 < x < 15000000

config.renderJavaScript

Crawl JavaScript-rendered pages by rendering them with a headless browser.

Rendering JavaScript-based pages is slower than crawling regular HTML pages.

config.requestOptions

object

Options to add to all HTTP requests made by the crawler.

config.safetyChecks

object

Checks to ensure the crawl was successful.

config.saveBackup

boolean

Whether to back up your index before the crawler overwrites it with new records.

config.schedule

string

Schedule for running the crawl, expressed in Later.js syntax. If omitted, you must start crawls manually.

The interval between two scheduled crawls must be at least 24 hours.
Times are in UTC.
Minutes must be explicit: at 3:00 pm not at 3 pm.
Everyday is every 1 day.
Midnight is at 12:00 pm.
If you omit the time, a crawl might start any time after midnight UTC.

config.sitemaps

string[]

Sitemaps with URLs from where to start crawling.

config.startUrls

string[]

URLs from where to start crawling.

Response

200 - application/json

startDate

string

required

Date and time when the test crawl started, in RFC 3339 format.

endDate

string

required

Date and time when the test crawl finished, in RFC 3339 format.

logs

array

required

Logs from the record extraction.

records

object[]

required

Extracted records from the URL.

links

string[]

required

Links found on the page, which match the configuration and would be processed.

externalData

object

External data associated with the tested URL. External data is refreshed automatically at the beginning of the crawl.

error

object

An error.

Was this page helpful?

Start a crawl List configuration versions

Tools

Crawler

Authorizations

Path Parameters

Body

Response