跳到主要内容

Typesense作为搜索引擎

· 阅读需 2 分钟

阅读量

0

阅读人次

0

在使用上 Algolia DocSearch 是使用体验最好的,也是 Docusaurus 官方支持的,但是奈何其不开源且收费。被我折腾过一次后,第二次就怎么也使用不了了。

Typesense 作为 Algolia 的开源平替版,和 Docusaurus 集成度也还行。但是目前发现它对中文分词好像不太友好。

mkdir $(pwd)/typesense-data
docker run -d --restart unless-stopped --name typesense -p 8108:8108 -v$(pwd)/typesense-data:/data typesense/typesense:26.0 --data-dir /data --api-key="amass_toolset" --enable-cors

安装 DocSearch Scraper

编写配置文件 typesense.json

{
"index_name": "amass_blog",
"start_urls": [
"https://amass.fun/"
],
"sitemap_urls": [
"https://amass.fun/sitemap.xml"
],
"sitemap_alternate_links": true,
"stop_urls": [],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"conversation_id": [
"833762294"
],
"nb_hits": 46250
}

创建 .env 文件

TYPESENSE_API_KEY=amass_toolset
TYPESENSE_HOST=amass.fun
TYPESENSE_PORT=8108
TYPESENSE_PROTOCOL=http

运行scraper

docker run -it --rm --env-file=$(pwd)/typesense-data/.env -e "CONFIG=$(cat $(pwd)/typesense-data/typesense.json | jq -r tostring)" typesense/docsearch-scraper:0.9.1