NOC 19.1

In accordance to our Release Policy we’re <https://getnoc.com/devteam/> proudly present release 19.1.

19.1 release contains of 272 bugfixes, optimisations and improvements.

Highlights

Usability

NOC Theme

19.1 introduces genuine NOC theme intended to replace venerable ExtJS’ gray. New flat theme is based upon Triton theme using NOC-branded colors. NOC theme can be activated via config on per-installation basis. We expect to make it default several releases later.

Collection Sharing

Collections is the viable part of NOC. We’re gracefully appreciate any contributions. In order to make contribution process easier we’d added Share button just into JSON preview. Enable collections sharing in config and create collections Merge Requests directly from NOC interface by single click.

New fm.alarm

Alarm console was thoroughly reworked. Current filters settings are stored in URL and can be shared with other users. Additional filters on services and subscribers were also added.

New runcommands

Run Commands interface was simplified. Left panel became hidden and working area was enlarged. List of objects can be modified directly from commands panel. Configurable command logging option was added to mrt service service.

Alarm acknowledgement

Alarms can be acknowledged by user to show that alarm has been seen and now under investigation.

Integration

We continue to move towards better integration with external systems. Our first priority is clean up and document API to be used by external systems to communicate with NOC.

NBI

A new nbi service has introduced. nbi service is the host for Northbound Interface API, allowing to access NOC’s data from upper-level system.

objectmetrics API for requesting metrics has introduced

DataStream

DataStream service got a lots of improvements:

API Key ACL

API Key got and additional ACL, allowing to restrict source addresses for particular keys.

Threshold Profiles

Threshold processing became more flexible. Instead of four fixed levels (Low error, low warning, high warning and high error) an arbitrary amount of levels can be configured via Threshold Profiles. Arbitrary actions can be set for each threshold violation, including:

  • raising of alarm

  • sending of notification

  • calling handlers

Threshold closing condition can differ from opening one, allowing hysteresis to suppress unnecessary flapping.

Syslog archiving

Starting from 19.1 NOC can be used as long-term syslog archive solution. ManagedObjectProfile got additional Syslog Archive Policy setting. When enabled, syslogcollector service mirrors all received syslog messages to long-term analytic ClickHouse database. ClickHouse supports replication, enforces transparent compression and has very descent IOPS requirements, making it ideal for high-load storage.

Collected messages can be queried both through BI interface and direct SQL queries.

STP Topology metrics

STP topology changes metrics supported out-of-box. Devices’ dashboards can show topology changes on graphs and further analytics can be applied. In combination with BI analytics network operators got the valuable tool to investigate short-term traffic disruption problems in large networks.

New platform detection policy

Behavior on new platform detection became configurable. Previous behavior was to automatically create platform, which can lead to headache in particular cases. Now you have and options configured from Managed Object Profile:

  • Create - preserve previous behavior and create new platform automatically (default)

  • Alarm - raise umbrella alarm and stop discovery

Firmware Policy

Behavior on firmware policy violation also became configurable. ManagedObjectProfile allow to configure following options:

  • Ignore - do nothing (default)

  • Ignore&Stop - Stop discovery

  • Raise Alarm - Raise umbrella alarm

  • Raise&Stop - Raise umbrella alarm and stop discovery

New Profiles

19.1 contains support for TV optical-to-RF converters widely used in cable TV networks. 2 profiles has introduced:

  • IRE-Polus.Taros

  • Vector.Lambda

In addition, an NSM.TIMOS profile became available

Performance, Scalability and optimisations

Caps Profile

caps discovery used to collect all known capabilities for platform. Sometimes it is not desired behavior. So Caps profiles are introduced. Caps Profiles allows to enable or disable particular group of capabilities checking. Group of capabilities can be explicitly enabled, disabled or enabled only if required for configured topology discovery.

High-precision timers

19.1 contains time.perf_counter backport to Python 2.7. perf_counter uses CPU counters to measure time intervals. It’s about 2x faster than time.time and allows more granularity in time interval measurements (time.time changes only ~64 times per second). This greatly increases precision of span interval measurements and of ping’s RTT metrics.

Pymongo connection pool tuning

Our investigations showed that current pymongo’s connection pool implementation has design flaw that leads to Pool connection poisoning problem under the common NOC’s workfload: once opened mongo connection from discovery never been closed, leaving lots of connection after the spikes of load. We’d implemented own connection pool and submitted pull request to pymongo project (See LIFO connection pool policy).

ClickHouse table cleanup policy

ClickHouse table retention policy may be configured on per-table basis. partition dropping is automated and may be called manually or from cron.

Redis cache backend

Our investigations showed that memcached is prone to randomly forget keys while enough memory is available. This leads to random discovery job states loss, leading to resetting the state of measured snmp counters, loosing random metrics and leaving empty gaps in grafana dashboards. Problem is hard to diagnose and only cure is to restart memcached process. Problem lies deeply in memcached internal architecture and unlikely to be fixed.

So we’d introduced support for Redis cache backend. We’ll make decision to make or not to make it default cache backend after testing period.

SO_REUSEPORT & SO_FREEBIND for collectors

syslogcollector and trapcollector services supports SO_REUSEPORT and SO_FREEBIND options for listeners.

SO_REUSEPORT allows to share single port by several collector’ processes using in-kernel load balancing, greatly improving collectors’ throughoutput.

SO_FREEBIND allows to bind to non-existing address, opening support for floating virtual addresses for collector (VRRP), CARP) etc), adding necessary level of redundancy.

In combination with new Syslog Archive and ClickHouse table cleanup policy features NOC can be turned to high-performance syslog archiving solution.

GridVCS

GridVCS is NOC’s high-performance redundant version control system used to store device configuration history. 19.1 release introduces several improvements to GridVCS subsystem.

  • built-in compression - though Mongo’s Wired Tiger uses transparent compression on storage level, explicit compression on GridVCS level reduces both disk usage and database server traffic.

  • Previous releases used mercurial’s mdiff to calculate config deltas. 19.1 uses BSDIFF4 format by default. During our tests BSDIFF4 showed better results in speed and delta size.

  • ./noc gridvcs command got additional compress subcommand, allowing to apply both compression and BSDIFF4 deltas to already collected data. While it can take a time for large storages it can free up significant disk space.

API improvements

profile.py

SA profiles used to live in __init__.py file. Our code style advises to keep __init__.py empty for various reason. Some features like profile loading from custom will not work with __init__.py anyway.

So starting with 19.1 it is recommended to place profile’s code into profile.py file. Loading from __init__.py is still supported but it is a good time to plan migration of custom profiles.

OIDRule: High-order scale functions

Metrics scale can be defined as high-order functions, i.e. function returning other functions. It’s greatly increase flexibility of scaling subsystem and allows external configuration of scaling processing.

IPAM seen propagation

Workflow’s seen signal can be configured to propagate up to the parent prefixes. Address and Prefix profiles got new Seen propagation policy setting which determines should or should not parent prefix will be notified of child element seen by discovery.

Common usage pattern is to propagate seen to aggregate prefixes to get notified when aggregate became used.

Phone workflow

phone module got full-blown workflow support. Each phone number and phone range has own state which can be changed manually or via external signals.

Breaking Changes

Migration

New features

MR

Title

1515

Add estimate param to job command.

1525

Collection sharing

1498

DataStream: asset part of ManagedObject

1516

APIKey ACL

1518

Add export/import to ./noc beef command.

1514

Configurable behavior on new platforms and firmware policy violations

1512

new fm-alarm

1508

IRE-Polus.Taros profile

1507

Summary glyph display order

1501

Add Errors Out and Discards In for ddash

1595

Add periodic diagnostic to alarm diagnostic.

1460

ThresholdProfile: Flexible thresholds configuration

1497

Alarm acknowledge/unacknowledge

1491

network stp topology changes on graph

1476

GridVCS: bsdiff4 patches and zlib compression

1432

Add initial support for NSN.TIMOS profile

1475

High-precision timers

1458

Add Network | STP | Topology Changes metric.

1455

CapsProfile

1396

redis cache backend

1404

#794: IPAM seen propagation policy

1384

card: project card

1390

#942: Remove Root container

1352

#694 ClickHouse table cleaning policy

1363

Vector.Lambda profile

1283

NOC theme

1336

OIDRule: High-order scale functions

1338

#539 Syslog archiving

1255

nbi service

1345

#497 syslogcollector/trapcollector: SO_REUSEPORT and IP_FREEBIND support

1252

datastream: Alarm datastream

1226

#636 Phone Workflow integraton

1113

Profiles should be moved to profile.py

Improvements

MR

Title

1534

Set default loglevel on command to info.

1535

Update RU translation.

1527

FM Alarms localization

1529

Add full_name to PlatformApplication query fields.

1522

Update/report interface status3

1510

Update DLink.DxS profile

1556

Update Rotek.BT profile (get_version)

1539

Update settings by snmp requests for Dlink.DxS

1500

Update Juniper.JUNOS profile

1503

Speedup NetworkSegment Service Summary count.

1502

Update Report for Interfaces Status

1490

Generic.get_chassis_id disable Multicast MAC address check.

1494

SKS.SKS and BDCOM.IOS config volatile.

1488

Add platform to Linksys.SPS2xx profile.

1451

Unified loader interface

1485

Add caps profile to managedobject profile ETL loader.

1484

Add to Linksys.SPS24xx platform OID

1434

./noc dnszone import: Parse complex $TTL directives

1452

Move methods from SegmentTopology to BaseTopology

1449

inv.networksegment: Bulk fields calculation

1454

Add to_python method to ClickHouse model.

1466

Add to Huawei.VRP profile get Serial Number attributes.

1453

ResourceGroup: TreeCombo

1461

Add config_volatile to Orion.NOS and SKS.SKS

1447

Increase query interval for core.pm.utils function.

1417

Extendable Generic.get_chassis_id script

1441

Add patern more to Huawei.MA5600T profile.

1440

Optimize reportalarmdetail and reportobjectdetail.

1439

Update/eltex mes execute snmp

1437

Delete aggregateinterface bi model

1420

Add dynamically loader BI models.

1418

RepoPreview MVVC

1427

Migrate Alstec.24xx.get_metrics to new model.

1414

networkx 2.2 and improvend spring layout implementation

1413

dns.dnsserver: Remove sync field

1400

requests 2.20.0

1392

Diverged permissions

1382

#961 Process All addresses and Loopback address syslog/trap source types

1408

Add Generic.get_vlans and get_switchport scripts.

1409

Add get_lldp_snmp capabilities for Cisco.IOS

1410

Change Iface Name OID for get_ifindexes Plante.WCDG profile

1374

migrate inv map to leafletjs

1381

#971 trapcollector: Gentler handling of BER decoding errors

1371

dnszone: Ignore addresses with missed FQDNs

1369

Add theme variable to login page render.

1368

Add “Up/10M” to reportcolumndatasource for report object detail.

1391

CODEOWNERS file

1353

#788 Try to determine VRF’s for DHCP address discovery

1361

DataStream: Load from custom

1251

Customized PyMongo connection pool

1397

Juniper.junos

1398

auto logout remove msg

1385

Dead code cleanup

1284

runcommands refactoring

1375

Cleanup pyrule from classifier trigger.

1341

theme body padding for form

1362

Add convert ifname for MA4000

1349

Cleanup AlliedTelesis profiles.

1346

snmp: Try to negotiate broken error_index

1344

Add Interface packets dashboard in MO dash.

1318

Migrate ReportProfileCheck report to ReportStat Backend.

1228

Move numpy import to parse_table_header in lib/text.

1316

Additional LLDP constants and caps conversion functions

1324

Add TZ parameter to NBI query.

1126

#260 add password widget

1322

Add get_lldp_neighbors and get_capabilities for Qtech2500 profile

1264

Add clean to events command.

1307

Update Alcatel.OS62xx profile

1285

Hp.1910

1190

Update Rotek.RTBSv1 profile

1297

Add Rotek.RTBSv1.get_metrics script.

1296

add get_config script for Dlink.DVG profile

1291

Extend job command.

1276

Add clean_id_bson to alarm datastream.

1274

threadpool: Cleanup worker result just after setting future

1286

Add late_alarm metric to seflmon fm collector.

1249

Profile.cli_retries_super_password parameter

1250

perm: response layout

1229

ldap: Additional check of username format

1214

Add telemetry to MRT service.

1244

Add physical iface count metrics to selfmon.

1216

Add vv (very verbose parameter) to test command.

Bugfixes

MR

Title

1487

Use ch_escape function on syslogcollector.

1478

Fix Report Unknown Model Summary.

1477

Fix Generic.get_capabilities snmp_v1

1474

Fix load metric priority. Profile first, Generic second.

1473

Fix Radio and SLA graph template for CH use.

1481

Fix displaying platform in some Cisco Stackable switches

1479

Fix Rotek RTBSv1 Tx Power metric

1438

Fix Huawei.VRP.get_mac_address_table script

1422

Fix MikroTik.RouterOS.get_interface_status_ex script

1462

Fix heavy cpu load on show vlan command

1469

Fix Huawei.VRP.get_version SerialNumber rogue chart.

1467

Fix DLink.DxS profile

1463

Fix Extreme.XOS.get_interfaces script

1465

Fix PrefixBookmark import loop.

1464

Fix selfmon FM metric name.

1457

Fix getting single oid from multiple metrics.

1444

Fix Iskratel.MSAN profile

1450

Fix Orion.NOS.get_lldp_neighbors script

1433

Fix Cisco.IOSXR profile

1436

Fix Cisco.NXOS.get_arp script

1448

Fix c.id in card.base.f_object_location.

1445

login button width fixed

1459

Lambda fix metrics

1468

Huawei.VRP.get_version strip serial number.

1435

InfiNet fix __init__.py pattern_prompt

1426

inv.map fix performance

1443

Fix Object.get_coordinate_zoom method.

1428

Fix Huawei.MA5600T profile

1430

Fix Alstec.24xx metric name.

1289

Fix Juniper.JUNOS.get_lldp_neighbors Parameter ‘remote_port’ required.

1423

Fix managedobject and object card for delete Root.

1429

Fix avs Object.get_address_text method

1424

Fix getting container path in Alarm Web and Card.

1425

Fix typo in ManagedObject console UI.

1483

Fix Raisecom.ROS.get_lldp_neighbors script

1395

Fix container field type when remove Root.

1401

ip.ipam: Fix prefix style

1411

Fix Add Objects to Maintenance from SA !582

1386

fix error “Отсутствуют адреса линка” in dns.reportmissedp2p

1405

Fix Discovery Problem Detail report trace.

1394

Fix get_lldp_neighbors by SNMP

1407

Fix Plantet.WGSD Profile

1403

#976 Fix closing of already closed session

1406

Fix avs environments graph tmpl 148

1402

jsloader fixed

1399

Fix Ubiquiti profile and Generic.get_interfaces(get_bulk)

1389

Fix Report Discovery Poison

1378

Fix theme variable in desktop.html template.

1379

Fix etl managedobject resourcegroup

1367

Fix prompt in Rotek.RTBS.v1 profile.

1366

Fix workflow CH dictionary.

1365

Fix selfmon FM collector.

1364

Fix update operation for superuser on secret field.

1376

noc/noc#952 Fix metric path for Environment metric scope.

1310

#964 Fix SA sessions leaking

1357

Natex_fix_sn

1355

Cisco_fix_snmp

1370

Increase ManagedObject cache version for syslog archive field.

1356

Fix Interface name Eltex.MES

1354

Fix Interface name QSW2500

1335

Fix get_interfaces, add reth aenet

1343

Fix profilecheckdetail.

1342

Fix secret field.

1351

InfiNet-fix-get_version

1350

Fix get_interfaces for Telindus profile

1348

Fix stacked packets graph.

1360

Fix Interface name ROS

1326

Fix ch_state ch datasource.

1332

Fix Span Card view from ClickHouse data.

1331

Fix Huawei.MA5600T.get_cpe.

1328

Fix Cisco.IOS.get_lldp_neighbors regex

1327

Fix get_interfaces for Rotek.RTBSv1, add rule for platform RT-BS24

1325

Fix CLIPS engine in slots.

1320

Fix SNMP Trap OID Resolver

1323

Fix get_interfaces for QSW2500 (dowwn -> down)

1269

Fix Juniper.JUNOSe.get_interfaces script

1278

Fix Huawei.MA5600T.get_cpe ValueError.

1314

Fix Generic.get_chassis_id script

1306

Fix AlliedTelesis.AT8000S.get_interfaces script

1313

Fix Cisco.IOS.get_version for ME series

1262

Fix Raisecom.RCIOS password prompt matching

1238

Fix Juniper.JUNOS profile

1279

Fixes empty range list in discoveryid.

1305

Fix Rotek.RTBS profiles.

1304

Fix some attributes for Span in MRT serivce

1303

Fix selfmon escalator metrics.

1300

fm.eventclassificationrule: Fix creating from event

1295

Fix ./noc mib lookup

1298

Fix custom metrics path in Generic.get_metrics.

1290

Fix custom metrics.

1225

noc/noc#954 Fix Cisco.IOS.get_inventory script

1275

Fix InfiNet.WANFlexX.get_lldp_neighbors script

1281

Delete quit() in script

1280

Fit get_config

1277

Fix Zhone.Bitstorm.get_interfaces script

1254

Fix InfiNet.WANFlexX.get_interfaces script

1272

Fix vendor name in SAE script credentials.

1246

Fix Huawei.VRP pager

1268

Fix scheme migrations

1245

Fix Huawei.VRP3 prompt match

1259

fix_error_web

1258

Fix managed_object_platform migration.

1260

Fix pm.util.get_objects_metrics if object_profile metrics empty.

1253

Fix path in radius(services)

1203

Fix prompt pattern in Eltex.DSLAM profile

1247

Fix consul resolver index handling

1239

#911 consul: Fix faulty state caused by changes in consul timeout behavior

1237

#956 fix web scripts

1221

Fix Generic.get_lldp_neighbors script

1243

Fix now shift for selfmon task late.

1231

noc/noc#946 Fix ManagedObject web console.

1235

Fix futurize in SLA probe.

1234

Fix Huawei.MA5600T.get_cpe.

1220

Fix Generic.get_interfaces script

1204

Fix Raisecom.ROS.get_interfaces script

1215

Fix platform field in Platform Card.

1210

ManagedObject datastream: Fix links property. capabilities property

1212

Fix save empty metrics threshold in ManagedObjectProfile UI.

1211

Fix interface validation errors in Huawei.VRP, Siklu.EH, Zhone.Bitstorm.

1317

sa.managedobjectprofile: Fix text

1340

noc/noc#966

1294

selfmon typo in mo

1105

#856 Rack view fix

1208

#947 Fix MAC ranges optimization