Skip to content

NOC 20.4

20.4 release contains [225](https://code.getnoc.com/noc/noc/merge_requests?scope=all&state=merged&milestone_title=20.4) bugfixes, optimisations and improvements.

Highlights

Generic Message Exchange

NOC can send notifications to email/telegram via Notification groups on alarms and configuration changes. Notifications are useful to take human attention to possible problem. To notify push data to external system NOC uses DataStream approach. External systems have to pull changes and process them according own logic.

NOC 20.4 generalises all data pushed to external systems to the concepts of messages. Message is the piece of data which can be passed from NOC to outside. Messages can be of different types:

  • alarms
  • object inventory data
  • configuration
  • configuration change
  • reboot
  • new object
  • system login
  • etc.

NOC can generate messages on certain condition. Humans and soulless robots can have interest in messages. So we need some kind of routing.

NOC 20.4 introduces new service, called Message Exchanger or mx. Like mail servers, mx receives the message, processes it headers and decides where to route the message. mx relies on family of the sender processes. Each kind of sender can deliver the message outside of the system. Each sender supports particular exchange protocol, hiding delivery details from mx. mx can transform messages or apply the templates to convert delivered message to desired format. mx, senders, message generation and transport conventions became the viable part of NOC called Generalized Message Exchange or GMX.

NOC 20.4 introduces kafkasender service, used to push data to a Kafka message bus. We're planning to convert other senders (mailsender, tgsender, etc) to GMX in the NOC 21.1.

Kafka Integration

NOC 20.4 introduces the kafkasender service, the part of GMX. Kafka became mainstream message bus in telecom operation, and NOC is being able to push all data, available via DataStream to a Kafka for following routing and processing, reducing amount of mutual system-to-system integrations.

Biosegmentation

Biosegmentation has been introduced in NOC 20.3 as ad-hoc segmentation process. Process relies on the series of trials. Each trial can lead to merging or fixing the structure of segments tree. Current implementation relies on inter-segment links. But sometimes the segment hierarchy must be established before the linking process.

NOC 20.4 introduces additional MAC-based biosegmentation approach, called Vacuum Bulling, allowing to build segment hierarchy basing on MAC addresses, collected on interfaces.

Ordered Message Queue

NOC uses NSQ as internal message queue. Lightweight and hi-performance solution shows good result usually. But after the time architectural corner cases became more and more visible:

  • NSQ designed to be always-on-dial solution. nsqd is on every host, communicating to publisher via localhost loopback. In modern container world that fact being bug, not a feature. Reliance on absolute reliability of connection between publisher and broker became unacceptable.
  • Subscribers have to communicate with nsqlookup service to find the hosts containing data. Then they have to establish direct connection with them. Official python NSQ client uses up to 5 tcp connections. So amount of connection grows fast with grow of amount producers and subscribers.
  • Official python NSQ client's error handling is far from ideal. Code base is old and obscure and hard to maintain. No asyncio version is available.
  • No fault tolerance. Failed nsqd will lead to the lost messages. No message replication at all.
  • Out-of-order messages. Message order may change due to internal nsqd implementation and to client logic. Applications like fault management relies on message order. Closing events must follow opening ones. Otherwise the hanging alarms will pollute the system.

During the researches we'd decided we need message system with commit-log approach. Though Kafka is industrial standard, its dependency on JVM and Zookeeper may be a burden. We stopped on Liftbridge. Liftbridge is clean and simple implementation of proven Kafka storage and replication algorithms.

We'd ported events topics to Liftbridge, fixing critical events ordering problem. GMX topics uses Liftbridge too. Next release (21.1) will address remaining topics.

FastAPI

We'd starting migration from Tornado to FastAPI. Main motivation is:

  • Tornado has bring generator-based asynchronous programming to Python2. Python3 has introduced native asynchronous programming along with asyncio library. Later Tornado versions are simple wrappers atop asyncio.
  • FastAPI uses Pydantic for request and response validation. We'd considered Pydantic very useful during out ETL refactoring
  • FastAPI generates OpenAPI/Swagger scheme, improving integration capabilities.
  • FastAPI is fast.

We'd ported login service to FastAPI. JWT had replaced Tornado's signed cookies. We'd also implemented the set of OAuth2-based endpoints for our next-generation UI.

ETL Improvements

ETL has relied on CSV format to store extracted data. Though it simple and wraps SQL responses in obvious way, it have some limitation:

  • Metadata of extracted fields stored outside of extractor, in the loader.
  • Field order hardcoded in loader
  • Fields has no type information, leading to leaky validation
  • No native way to pass complex data structures, like list and nested documents
  • Extractors must return empty data for long time deprecated fields

NOC 20.4 introduces new extractor API. Instead of lists, passed to CSV, extractor returns pydantic model instances. Pydantic models are defined in separate modules and reused by both extractors and loaders. Interface between extractor and loader became well-defined. Models perform data validation on extraction and load stages. So errors in extractor will lead to informative error message and to the stopping of process.

ETL now uses JSON Line format (jsonl) - a bunch of JSON structures for each row, separated by newlines. So it is possible to store structures with arbitrary complexity. We'd ever provided the tool to convert legacy extracted data to a new format.

SNMP Rate Limiting

NOC 20.4 allows to limit a rate of SNMP requests basing on profile or platform settings. This reduces impact on the platforms with weak CPU or slow control-to-dataplane bus.

orjson

orjson is used instead of ujson for JSON serialization/deserialization.

New profiles

  • KUB Nano
  • Qtech.QFC

Migration

Tower Upgrade

Please upgrade Tower up to 1.0.0 or later before continuing NOC installation/upgrade process. See [Tower upgrade process documentation](https://code.getnoc.com/noc/tower/-/blob/master/UPDATING.md) for more details.

Elder versions of Tower will stop deploy with following error message

Liftbridge/NATS

NOC 20.4 introduces Liftbridge service for ordered message queue. You should deploy at least 1 Liftbridge and 1 NATS service instance. See more details in Tower's service configuration section.

ETL

Run fix after upgrade

` $ ./noc fix apply fix_etl_jsonl`

New features

MR Title
MR1668 Added function get alarms for controllers and devices
for periodic job.
MR4223 FastAPI login service
MR4256 Add Project to ETL
MR4274 New profile Qtech.QFC
MR4290 Liftbridge client
MR4361 #1363 ifdesc: Interface autocreation
MR4388 Add new controller profile KUB Nano
MR4398 mx service
MR4403 kafkasender service
MR4473 #1368 Model Interface scopes
MR4488 #892 ETL JSON format
MR4519 noc/noc#1356 SNMP Rate Limit
MR4538 Configurable LDAP server policies
MR4567 Biosegmentation: Vacuum bulling

Improvements

MR Title
MR4225 Fix ddash refid
MR4233 Allow alternative locations for binary speedup modules
MR4236 Catch when sentry-sdk module enabled but not installed.
MR4246 Fix Qtech.BFC profile
MR4261 noc/noc#1304 Replace ujson with orjson
MR4264 runtime optimization ReportMaxMetrics
MR4275 ElectronR.KO01M profile scripts
MR4278 noc/noc#1383 Add IfPath collator to confdb
MR4280 noc/noc#1381 Add alarm_consequence_policy to TTSystem settings.
MR4281 #1384 Add source-ip aaa hints.
MR4287 Add round argument to metric scale function
MR4293 Debian-based docker image
MR4296 Change python to python3 when use ./noc
MR4314 Update Card for Sensor Controller
MR4320 Fill capabilities for beef.
MR4338 New Grafana dashboards
MR4344 Profile fix controllers
MR4348 exp_decay window function
MR4349 Controller/fix2
MR4354 add_interface-type_Juniper_JUNOSe
MR4358 Fix Qtech.BFC profile
MR4364 LiftBridgeClient: Proper handling of message headers
MR4369 LiftBridgeClient: fetch_metadata() stream and wait_for_stream parameters
MR4380 Add to_json for thresholdprofile
MR4383 Update threshold handler
MR4384 Add collators to some profiles.
MR4389 Electron fix profile
MR4391 add new metric Qtech.BFC
MR4394 fix some controllers ddash/metrics
MR4396 Fix inerfaces name Qtech.BFC
MR4399 Up report MAX_ITERATOR to 800 000.
MR4402 mx: Use FastAPIService
MR4405 liftbridge cursor persistence api
MR4407 add_columns_total_reportmaxmetrics
MR4416 Add csv+zip format to ReportDetails.
MR4417 Add Long Alarm Archive options to ReportAlarm, from Clickhouse table.
MR4428 Add available_only options to ReportDiscoveryTopologyProblem.
MR4432 Reset NetworkSegment TTL cache after remove.
MR4433 Change is_uplink criterias priority on segment MAC discovery.
MR4439 fix_reportmaxmetrics
MR4447 Add octets_in_sum and octets_out_sum columns to ReportMetrics.
MR4453 ConfDB syslog
MR4455 Fix controllers profiles, ddash
MR4457 Fix get_iface_metrics
MR4462 noc/noc#1392 Add search port by contains ifdescription token to ifdecr discovery.
MR4464 LiftBridge client: Connection pooling
MR4470 Add ReportMovedMacApplication application.
MR4475 Add sorted to tags application.
MR4477 noc/noc#1416 Extend ConfDB meta section.
MR4479 Add get_confdb_query method to ManagedObjectSelector and MatchPrefix ConfDB function.
MR4480 Add csv_zip file format to MetricsDetail Report.
MR4483 noc/noc#1397 Additional biosegtrial criteria to policy.
MR4486 Add migrate_ts field to ReportMovedMac.
MR4501 noc/noc#1428 Add InterfaceDiscoveryApplicator for fill ConfDB info from interface discovery.
MR4508 add_csvzip_reportmaxmetrics
MR4511 Fix ./noc discovery for LB
MR4515 noc/noc#1432 lb client: Configurable message size limit
MR4516 fix csv_import view
MR4517 Additional options to segment command
MR4535 Bump networkx/numpy requirements
MR4539 lb client: increased resilience
MR4547 Add JOB_CLASS param to core.defer util.
MR4549 ETL model Reference
MR4551 add column reboots in fm.reportalarmdetail
MR4553 fix processing trunk port vlan for HP A3100-24 (v5.20.99)
MR4565 Add ttl-policy argument to link command.
MR4571 Filter Multicast MACs on Moved MAC report.
MR4573 Add api_unlimited_row_limit param
MR4579 liftBridge: publish_async waits for all the acks
MR4582 noc/noc#1371 Add schedule_discovery_config handler to events.discovery.
MR4592 noc/noc#1400 Migrate InterfaceClassification to ConfDB.
MR4602 Add MatchAllVLAN and MatchAnyVLAN function to ConfDB.
MR4607 Bump pytest version
MR4624 add metrics Subscribers \| Summary Alcatel.TIMOS
MR4629 noc/noc#1440 Use all macs on 'Discovery ID cache poison' report.
MR4630 Convert limit from dcs to int.
MR4632 Add Telephony SIP metrics graph.
MR4633 Always uplinks calculate.

Bugfixes

MR Title
MR4249 Fix card MO
MR4251 Fix status RNR
MR4258 Change field_num on ReportObjectStat
MR4269 noc/noc#1374 Fix typo on datastream format check.
MR4285 Fix Profile Check Summary typo.
MR4303 #1335 ConfDB: Fix and inside or combination
MR4310 Fix RNR affected AD
MR4319 Add err_status to beef snmp_getbulk_response method.
MR4321 Convert oid on snmp raw_varbinds.
MR4322 Fix event clean
MR4327 Convert set to list on orjson dumps.
MR4328 Add xmac discovery to ReportDiscoveryResult.
MR4363 ./noc migrate-liftbridge: Do not create streams for disabled services
MR4368 Fix hash_int()
MR4373 Fix typo on Calcify Biosegmentation policy.
MR4409 Add get_pool_partitions method to TrapCollectorService.
MR4418 Add id field to project etl loader.
MR4419 Fix multiple segment args on discovery command.
MR4423 noc/noc#1399 Delete Permissions and Favorites on wipe user.
MR4424 noc/noc#1375 Fix DEFAULT_STENCIL use on SegmentTopology.
MR4425 noc/noc#1396 AlarmEscalation. Use item delay for consequence escalation.
MR4426 Fix extapp group regex splitter to non-greedy.
MR4430 Fix ManagedObject _reset_caches key for _id_cache.
MR4452 noc/noc#1406 Use system username for JWT.
MR4461 noc/noc#1229 Fix user cleanup Django Admin Log.
MR4472 Add audience param to is_logged jwt.decode.
MR4474 Add 120 sec to out_of_order escalation time.
MR4485 noc/noc#688 Fix invalidate l1 cache for ManagedObject.
MR4492 Skipping files if already compressed on destination.
MR4497 noc/noc#1427 Fix whois ARIN url.
MR4498 Fix object data use.
MR4502 Move orjson defaults to jsonutils.
MR4505 Bump ssh2-python to 0.23.
MR4506 pm/utils -> Fix dict
MR4507 Some etl loader fixes.
MR4513 noc/noc#1423 Convert pubkey to bytes.
MR4514 Convert empty object data to list on 0020 migration.
MR4518 Fix vendors and handlers migrations
MR4522 Fix typo on ifdescr discovery.
MR4524 #1312 Consistent VPN ID generation
MR4540 Fix customfields for mongoengine.
MR4555 Revert uvicorn to 0.12.1.
MR4561 Fix typo on interfaceprofile UI Application.
MR4564 Fix trace when execute other script that command on MRT.
MR4569 Fix typo on MRT service.
MR4575 Add static_service_groups and static_client_groups clean_map to managedobject etl loader.
MR4590 Fix login cookie ttl
MR4594 Fix ETL loader change.
MR4595 Fix extra filter when set extra order.
MR4598 Fix datetime field on Service ETL model.
MR4614 Fix SNMP_GET_OIDS on get_chassis_id scripts to list.
MR4627 noc/noc#1439 Fix tag contains query for non latin symbol.

Code Cleanup

MR Title
MR4254 Cleanup flake.
MR4301 Fix vendor docs test
MR4317 Updated .dockerignore
MR4360 Remove unused dependencies: tornadis, mistune
MR4362 Update blinker, bsdiff, cachetools, crontab,
progressbar2, psycopg2, python-dateutil versions
MR4465 Remove legacy scripts/ci-run
MR4496 Fix formatting
MR4533 Bump requirements
MR4587 Fix collect beef for orjson.
MR4589 Fix some lint errors
MR4622 Fix Service etl model.

Profile Changes

Cisco.IOS

MR Title
MR4316 Update Cisco.IOS profile to support more physical
interfaces

Cisco.IOSXR

MR Title
MR4408 added interfacetypes for IOSXR platform

DLink.DxS

MR Title
MR4355 DLink.DxS.get_metrics. Fix SNMP Error when 'CPU
Usage' metric.
MR4434 Fix Dlink.DxS profile.

EdgeCore.ES

MR Title
MR4556 EdgeCore.ES.get_spanning_tree. Fix getting port_id
for Trunk interface.

Eltex.MES

MR Title
MR4217 test tacacs1.yml crashed. AssertionError: assert \[\] == \[(right syntax)\]
MR4262 Eltex.MES.get_capabilities. Fix detect stack mode by SNMP.
MR4523 Eltex.MES.get_vlans. Use Generic script.
MR4615 Eltex.MES. Add 1.3.6.1.4.1.89.53.4.1.7.1 to display_snmp.

Eltex.MES24xx

MR Title
MR4381 Fix Eltex.MES24xx.get_interfaces script

Extreme.XOS

MR Title
MR4404 Fix Extreme.XOS.get_lldp_neighbors script

Generic

MR Title
MR4239 Generic.get_capabilities add SNMP \| OID \|EnterpriseID len check.
MR4342 Generic.get_arp. Cleanup snmp for py3
MR4613 Generic.get_chassis_id. Add 'LLDP-MIB::lldpLocChassisId' oid to display_hints.

Huawei.MA5600T

MR Title
MR4611 Huawei.MA5600T.get_spanning_tree. Fix waited
command.

Huawei.VRP

MR Title
MR4422 Huawei.VRP. Add NE8000 version detect.
MR4550 Huawei.VRP fix normalize_enable_stp
MR4557 Huawei.VRP. Check nexthop type on ConfDB route normalizer.

Juniper.JUNOS

MR Title
MR4324 Fix Juniper.JUNOS.get_chassis_id script
MR4377 Fix Juniper.JUNOS.get_interfaces script

NAG.SNR

MR Title
MR4351 Fix NAG.SNR.get_interfaces script
MR4481 Fix NAG.SNR.get_lldp_neighbors script

Qtech.QSW

MR Title
MR4576 Fix Qtech.QSW profile

Qtech.QSW2800

MR Title
MR4444 Qtech.QSW2800. Add sdiag prompt.
MR4542 Fix Qtech.QSW2800.get_version script

Ubiquiti.AirOS

MR Title
MR4240 Ubiquiti.AirOS.get_version. Cleanup for py3.

rare

MR Title
MR4214 ConfDB tests profile Raisecom.RCIOS.
MR4241 Alstec.MSPU.get_version. Fix HappyBaby platform regex.
MR4265 Fix ZTE.ZXA10 profile
MR4272 Eltex.WOPLR. Add get_interface_type method to profile.
MR4279 Update Rotek.BT profile
MR4288 Add Enterasys.EOS profile
MR4295 Fix metric name
MR4302 add snmp in profile Juniper.JUNOSe
MR4313 Rotek.BT fix get_metrics
MR4335 add snmp in profile Alcatel.TIMOS
MR4353 Update ZTE.ZXA10 profile to support C610
MR4365 Fix prompt matching in Fortinet.Fortigate profile
MR4371 Alcatel.OS62xx.get_version. Set always_prefer to S for better platform detect.
MR4376 fix_get_lldp_neighbors_NSN.TIMOS
MR4406 Add AcmePacket.NetNet profile.
MR4431 noc/noc#1391 Cisco.WLC. Add get_interface_type method.
MR4536 add_bras_metrics_Juniper_JUNOSe
MR4570 Fix h3c get_switchport
MR4578 Eltex.ESR add snmp support
MR4583 Update DCN.DCWS profile.py
MR4585 Update sa/profiles/DCN/DCWS/get_config.py
MR4586 Ericsson.SEOS.get_interfaces. Migrate to Generic SNMP.
MR4596 Fix DLink.DxS_Smart profile
MR4600 Huawei.VRP3.get_interface_status_ex. Fix return in/out speed as kbit/sec.
MR4610 Huawei.VRP3.get_interface_status_ex. Fix trace when SNMP Timeout.
MR4617 NSN.TIMOS.get_interfaces. Fix empty MAC on output.

Collections Changes

MR Title
MR4277 Add more Juniper part number
MR4282 Add new caps - Sensor | Controller
MR4294 New Environment metrics
MR4305 Fix bad json on collection.
MR4307 Cleanup HP fm.eventclassificationrule.
MR4337 Fix get metrics script for controller
MR4345 Fix dev.specs SNMP chassis for Huawei and Generic.
MR4411 Add some Juniper models
MR4451 Add some Juniper models
MR4460 noc/noc#1411 Add PhonePeer MetricScope.
MR4499 Fix default username BI dashboard.
MR4520 sa.profilecheckrules: Eltex | MES | MES5448 sysObjectID.0
MR4625 Add AcmePacket Vendor.

Deploy Changes

MR Title
MR4478 noc/noc#1241 Merge ansible deploy to master repo
MR4623 Add liftbridge deployflow
MR4637 Fix auth path redirect
MR4640 Catch trace on etl loader when delete lost mapping.
MR4643 Change start condition