Should APNIC drop RPKI Invalids?
At this stage I think it is more important for APNIC to focus on gaining
operational experience (which hopefully leads to increased Trust Anchor
stability), than to count how many RPKI invalid BGP routes (potential
hijacks!) APNIC's production network accepted.
On Thu, May 06, 2021 at 12:48:50PM +1000, Geoff Huston via SIG Routing Security wrote:
If AS4608 turns on drop invalids, as you appear to be
I for one would loose some aspects of visibility, and while we are all
trying to deploy this and make it stable it seems to me to be a real
shame if we were blinded from this deployment data at this stage.
Have the APNIC researchers considered the option to count 'invalid
routes rejected', instead of counting 'invalid routes accepted'?
Assuming APNIC's EBGP border routers are not from an ancient make &
model, those routers probably have support for RFC 7854 "BGP Monitoring
Protocol" (BMP) which can be used to enable researchers to inspect
"pre-policy Adj-RIB-In" data.
If the APNIC researchers set themselves up with a BMP feed, they won't
need to encourage the APNIC OPS team to degrade APNIC's routing security
posture for the sake of 'science'. Instead, APNIC OPS can develop an
sound routing policy focused on routing table hygiene, while the
research department can work with unfiltered data.
The question for me is what is the _real_ objective of
invalids and are they other ways to achieve this that does not impair
the visibility of measurement tools that we run from within AS4608?
The real objective is to decrease the gap that currently exists between
those holding the keys to the kingdom and those trying to make a pig
fly in production environments.
The same considerations that apply to any multi-homed network, apply to
APNIC's network: when a multi-homed network uses RPKI ROV and rejects
invalids, they increase their chances of sending packets down a path
that leads to the authorized origin.
An organization like APNIC should "walk like it talks", APNIC should
gladly suffer the same pain other network operators experience. What's
the worst that can happen? that APNIC OPS discover a problem before
members discover/report it, and then add a check to a pipeline to
prevent it from re-occuring?
If APNIC (who are a regular Autonomous System, like tens of thousands of
other ASs!) places themselves in the same boat and uses the same tools
and technologies (BGP-4, RPKI, HTTPS, DNS) that others have to use to
connect to the Internet, conversations how to improve become much
We've seen in the last 2 years how a lack of clue and hands-on
experience led some people to believe that validating signatures,
validating manifests, CRLs, expiration dates, or detecting basic
encoding errors all are optional and just for funzies. Sadly we've also
observed how RSYNC and RRDP service endpoints can be broken for months
on end. It is time to stop merely masking problems - but really solve
the underlaying issue. Problems in this context are a thousand-fold
more difficult to discuss when one of the parties will never experience
any operational consequences from failures in the RPKI.
The credibility of Trust Anchor operators will increase if those
operators actually use the RPKI technology stack for production purposes
ps. I'm happy to volunteer my time to help APNIC write router configs /
set up RTR servers based on my experiences with such work.