IIS 6 - 仅记录某些目录

Question

aspyct

Asked: 2019-08-07 05:29:15 +0800 CST2019-08-07 05:29:15 +0800 CST 2019-08-07 05:29:15 +0800 CST

Elasticsearch：如何“拯救”无法通过映射解析的文档？

772

我们使用 ElasticSearch 来存储和检查来自我们基础设施的日志。其中一些日志是法律要求的，我们不能丢失任何日志。

我们已经在没有任何映射的情况下解析日志很长一段时间了。这使得它们大多无法用于搜索和/或绘图。例如，一些整数字段已被自动识别为文本，因此我们无法在直方图中聚合它们。

我们想引入模板和映射，这将解决新索引的问题。

但是，我们注意到拥有映射也为解析失败打开了大门。如果一个字段被定义为整数，但突然得到一个非整数值，那么解析将失败，文档将被拒绝。

这些文件有什么地方去和/或有什么方法可以保存它们以供以后检查？

下面的 Python 脚本适用于本地 ES 实例。

#!/usr/bin/env python3

import requests
import JSON
from typing import Any, Dict


ES_HOST = "http://localhost:9200"


def es_request(method: str, path: str, data: Dict[str, Any]) -> None:
    response = requests.request(method, f"{ES_HOST}{path}", json=data)

    if response.status_code != 200:
        print(response.content)


es_request('put', '/_template/my_template', {
    "index_patterns": ["my_index"],
    "mappings": {
        "properties": {
            "some_integer": { "type": "integer" }
        }
    }
})

# This is fine
es_request('put', '/my_index/_doc/1', {
    'some_integer': 42
})

# This will be rejected by ES, as it doesn't match the mapping.
# But how can I save it?
es_request('put', '/my_index/_doc/2', {
    'some_integer': 'hello world'
})

运行脚本会出现以下错误：

{
    "error": {
        "root_cause": [
            {
                "type": "mapper_parsing_exception",
                "reason":"failed to parse field [some_integer] of type [integer] in document with id '2'. Preview of field's value: 'hello world'"
            }
        ],
        "type": "mapper_parsing_exception",
        "reason":"failed to parse field [some_integer] of type [integer] in document with id '2'. Preview of field's value: 'hello world'",
        "caused_by": {
            "type": "number_format_exception",
            "reason": "For input string: \"hello world\""
        }
    },
    "status": 400
}

然后文件丢失了，或者看起来是这样。我可以在某处设置一个选项，将文档自动保存在其他地方，一种死信队列吗？

tl;dr：我们需要映射，但不能因为解析错误而丢失日志行。我们可以自动将不适合映射的文档保存到其他地方吗？

2 个回答

Voted

aspyct · Answer 1 · 2019-11-15T01:42:20+08:00

事实证明，它就像允许“格式错误”的属性一样简单。有两种方法可以做到这一点。在整个索引上：

PUT /_template/ignore_malformed_attributes
{
  "index_patterns": ["my_index"],
  "settings": {
      "index.mapping.ignore_malformed": true
  }
}

或每个属性（参见此处的示例：https ://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html ）

PUT my_index
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer",
        "ignore_malformed": true
      },
      "number_two": {
        "type": "integer"
      }
    }
  }
}

# Will work
PUT my_index/_doc/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

# Will be rejected
PUT my_index/_doc/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

请注意，您还可以更改现有索引的属性，但您需要先关闭它：

POST my_existing_index/_close
PUT my_existing_index/_settings
{
  "index.mapping.ignore_malformed": false
}
POST my_existing_index/_open

注意：在刷新索引模式之前，类型更改不会在 kibana 中可见。然后，您将遇到类型冲突，这需要您重新索引数据以再次搜索它......真是太痛苦了。

POST _reindex
{
  "source": {
    "index": "my_index"
  },
  "dest": {
    "index": "my_new_index"
  }
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

EOhm · Answer 2 · 2019-11-20T14:38:16+08:00

EOhm

2019-11-20T14:38:16+08:002019-11-20T14:38:16+08:00

对于相当多的用例可能更可取的替代方法是在生产者和 Elasticseaech 之间放置一个 logstash。logstash 可以重新格式化和/或检查和路由到特定索引。
或者当然，如果您有本地生产者，让他们验证和路由。

0

Elasticsearch：如何“拯救”无法通过映射解析的文档？

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

Elasticsearch：如何“拯救”无法通过映射解析的文档？

2 个回答

相关问题